Category: Benchmark metrics

Decisions, decisions

on September 14, 2017

Back in April, we shared some of our initial ideas for a new version of WebXPRT, and work on the new benchmark is underway. Any time we begin the process of updating one of the XPRT benchmarks, one of the first decisions we face is how to improve workload content so it better reflects the types of technology average consumers use every day. Since benchmarks typically have a life cycle of two to four years, we want the benchmark to be relevant for at least the next couple of years.

For example, WebXPRT contains two photo-related workloads, Photo Effects and Organize Album. Photo Effects applies a series of effects to a set of photos, and Organize Album uses facial recognition technology to analyze a set of photos. In both cases, we want to use photos that represent the most relevant combination of image size, resolution, and data footprint possible. Ideally, the resulting image sizes and resolutions should differentiate processing speed on the latest systems, but not at the expense of being able to run reasonably on most current devices. We also have to confirm that the photos aren’t so large as to impact page load times unnecessarily.

The way this strategy works in practice is that we spend time researching hardware and operating system market share. Given that phones are the cameras that most people use, we look at them to help define photo characteristics. In 2017, the most widespread mobile OS is Android, and while reports vary depending on the metric used, the Samsung Galaxy S5 and Galaxy S7 are at or near the top of global mobile market share. For our purposes, the data tells us that choosing photo sizes and resolutions that mirror those of the Galaxy line is a good start, and a good chunk of Android users are either already using S7-generation technology, or will be shifting to new phones with that technology in the coming year. So, for the next version of WebXPRT, we’ll likely use photos that represent the real-life environment of an S7 user.

I hope that provides a brief glimpse into the strategies we use to evaluate workload content in the XPRT benchmarks. Of course, since the BenchmarkXPRT Development Community is an open development community, we’d love to hear your comments or suggestions!

Justin

Posted in Android, Benchmark metrics, Benchmarking, BenchmarkXPRT development community, Collaborative benchmark development, Performance benchmarking, WebXPRT, WebXPRT 2017, What makes a good benchmark? |

Introducing the XPRT Selector

By Justin Greene

on August 24, 2017

We’re proud of all the XPRT tools, each of which serves a different purpose for the people who rely on them. But for those new to the XPRTs, we wanted a way to make it easy to tell which tool will best suit each person’s specific requirements. To that end, today we’re excited to announce the XPRT Selector, an interactive web tool that helps consumers, developers, manufacturers, and reviewers zero in on exactly which XPRT tool is the right match for their needs.

Using the XPRT Selector is easy. Simply spin the dials on the wheel to choose the categories that best describe yourself, the devices and operating systems you’re working with, and the topic that interests you. Once you’ve aligned the dials, click Get results, and the Selector will present all the free XPRT tools and resources that are available to you. Along with choosing the best tools for you, the XPRT Selector also explains the purpose and capabilities of each tool.

To see the Selector in action, check out the short video below. You can take the XPRT Selector for a spin at http://www.principledtechnologies.com/benchmarkxprt/the-xprt-selector/.

All the XPRT tools have one thing in common: They help take the guesswork out of device evaluation and comparison, making them invaluable for anyone using, making, or writing about tech products. We think the XPRT Selector is a great addition to the fold!

Justin

Posted in Benchmark metrics, Benchmarking, BenchmarkXPRT, BenchmarkXPRT development community, Interactive tools, The XPRT Selector |

Best practices

By Justin Greene

on August 3, 2017

Recently, a tester wrote in and asked for help determining why they were seeing different WebXPRT scores on two tablets with the same hardware configuration. The scores differed by approximately 7.5 percent. This can happen for many reasons, including different software stacks, but score variability can also result from different testing behavior and environments. While some degree of variability is natural, the question provides us with a great opportunity to talk about the basic benchmarking practices we follow in the XPRT lab, practices that contribute to the most consistent and reliable scores.

Below, we list a few basic best practices you might find useful in your testing. While they’re largely in the context of the WebXPRT focus on evaluating browser performance, several of these practices apply to other benchmarks as well.

Test with clean images: We use an out-of-box (OOB) method for testing XPRT Spotlight devices. OOB testing means that other than initial OS and browser version updates that users are likely to run after first turning on the device, we change as little as possible before testing. We want to assess the performance that buyers are likely to see when they first purchase the device, before installing additional apps and utilities. This is the best way to provide an accurate assessment of the performance retail buyers will experience. While OOB is not appropriate for certain types of testing, the key is to not test a device that’s bogged down with programs that influence results unnecessarily.
Turn off updates: We do our best to eliminate or minimize app and system updates after initial setup. Some vendors are making it more difficult to turn off updates completely, but you should always account for update settings.
Get a feel for system processes: Depending on the system and the OS, quite a lot of system-level activity can be going on in the background after you turn it on. As much as possible, we like to wait for a stable baseline (idle) of system activity before kicking off a test. If we start testing immediately after booting the system, we often see higher variability in the first run before the scores start to tighten up.
Disclosure is not just about hardware: Most people know that different browsers will produce different performance scores on the same system. However, testers aren’t always aware of shifts in performance between different versions of the same browser. While most updates don’t have a large impact on performance, a few updates have increased (or even decreased) browser performance by a significant amount. For this reason, it’s always worthwhile to record and disclose the extended browser version number for each test run. The same principle applies to any other relevant software.
Use more than one data point: Because of natural variability, our standard practice in the XPRT lab is to publish a score that represents the median from at least three to five runs. If you run a benchmark only once, and the score differs significantly from other published scores, your result could be an outlier that you would not see again under stable testing conditions.

We hope those tips will make testing a little easier for you. If you have any questions about the XPRTs, or about benchmarking in general, feel free to ask!

Justin

Posted in Benchmark metrics, Benchmarking, Browser-based benchmarks, Performance benchmarking, Performance testing on tablets, Uncategorized, WebXPRT, WebXPRT 2013, WebXPRT 2015, XPRT Weekly Tech Spotlight |

Apples and pears vs. oranges and bananas

By Eric Hale

on July 6, 2017

When people talk about comparing disparate things, they often say that you’re comparing apples and oranges. However, sometimes that expression doesn’t begin to describe the situation.

Recently, Justin wrote about using CrXPRT on systems running Neverware CloudReady OS. In that post, he noted that we couldn’t guarantee that using CrXPRT on CloudReady and Chrome OS systems would be a fair comparison. Not surprisingly, that prompted the question “Why not?”

Here’s the thing: It’s a fair comparison of those software stacks running on those hardware configurations. If everyone accepted that and stopped there, all would be good. However, almost inevitably, people will read more into the scores than is appropriate.

In such a comparison, we’re changing multiple variables at once. We’ve written before about the effect of the software stack on performance. CloudReady and Chrome OS are two different implementations of the Chromium OS, and it’s possible that one is more efficient than the other. If so, that would affect CrXPRT scores. At the same time, the raw performance of the two hardware configurations under test could also differ to a certain degree, which would also affect CrXPRT scores.

Here’s a metaphor: If you measure the effective force at the end of two levers and find a difference, to what do you attribute that difference? If you know the levers are the same length, you can attribute the difference to the amount of applied force. If you know the applied force is identical, you can attribute the difference to the length of the levers. If you lack both of those data points, you can’t know whether the difference is due to the length, the force, or a combination of the two.

With a benchmark, you can run multiple experiments designed to isolate variables and use the results from those experiments to look for trends. For example, we could install both CloudReady OS and Chrome OS on the same Intel-based Chromebook and compare the CrXPRT results. Because that removes hardware differences as a variable, such an experiment would offer some insight into how the two implementations compare. However, because differences in hardware can affect the performance of a given piece of software, this single data point would be of limited value. We could repeat the experiment on a variety of other Intel-based Chromebooks, and other patterns might emerge. If one of the implementations consistently scored higher, that would suggest that it was more efficient than the other, but would still not be definitively conclusive.

I hope this gives you some idea about why we are cautious about drawing conclusions when comparing results from different sets of hardware running different software stacks.

Eric

Posted in Benchmark metrics, Benchmarking, Benchmarks in general, Chrome OS, Chromebooks, CrXPRT, Google Chrome, Performance benchmarking, What makes a good benchmark? |

Learning something new every day

By Justin Greene

on June 1, 2017

We’re constantly learning and thinking about how the XPRTs can help people evaluate the tech that will soon be a part of daily life. It’s why we started work on a tool to evaluate machine learning capabilities, and it’s why we developed CrXPRT in response to Chromebooks’ growing share of the education sector.

The learning process often involves a lot of tinkering in the lab, and we recently began experimenting with Neverware’s CloudReady OS. CloudReady is an operating system based on the open-source Chromium OS. Unlike Chrome OS, which can run on only Chromebooks, CloudReady can run on many types of systems, including older Windows and OS X machines. The idea is that individuals and organizations can breathe new life into aging hardware by incorporating it into a larger pool of devices managed through a Google Admin Console.

We were curious to see if it worked as advertised, and if it would run CrXPRT 2015. Installing CloudReady on an old Dell Latitude E6430 was easy enough, and we then installed CrXPRT from the Chrome Web Store. Performance tests ran without a hitch. Battery life tests would kick off but not complete, which was not a big surprise because the battery life calls involved were developed specifically for Chrome OS.

So, what role can CrXPRT play with CloudReady, and what are the limitations? CloudReady has a lot in common with Chrome OS, but there are some key differences. One way we see the CrXPRT performance test being useful is for comparing CloudReady devices. Say that an organization was considering adopting CloudReady on certain legacy systems but not on others; CrXPRT performance scores would provide insight into which devices performed better with CloudReady. While you could use CrXPRT to compare those devices to Chromebooks, the differences between the operating systems are significant enough that we cannot guarantee the comparison would be a fair one.

Have you spent any time working with CloudReady, or are there other interesting new technologies you’d like us to investigate? Let us know!

Justin

Posted in Battery life, Benchmark metrics, Chrome OS, Chromebooks, CrXPRT, Performance benchmarking |

BatteryXPRT: A quick and reliable way to estimate Android battery life

By Justin Greene

on May 25, 2017

In the last few weeks, we reintroduced readers to the capabilities and benefits of TouchXPRT and CrXPRT. This week, we’d like to reintroduce BatteryXPRT 2014 for Android, an app that evaluates the battery life and performance of Android devices.

When purchasing a phone or tablet, it’s good to know how long the battery will last on a typical day and how often you’ll need to charge it. Before BatteryXPRT, you had to rely on a manufacturer’s estimate or full rundown tests that perform tasks that don’t resemble the types of things we do with our phones and tablets every day.

We developed BatteryXPRT to estimate battery life reliably in just over five hours, so testers can complete a full evaluation in one work day or while sleeping. You can configure it to run while the device is connected to a network or in Airplane mode. The test also produces a performance score by running workloads that represent common everyday tasks.

BatteryXPRT is easy to install and run, and is a great resource for anyone who wants to evaluate how well an Android device will meet their needs. If you’d like to see test results from a variety of Android devices, go to BatteryXPRT.com and click View Results, where you’ll find scores from many different Android devices.

If you’d like to run BatteryXPRT:

Simply download BatteryXPRT from the Google Play store or BatteryXPRT.com. The BatteryXPRT installation instructions and user manual provide step-by-step instructions for how to configure your device and kick off a test. We designed BatteryXPRT 2014 for Android to be compatible with a wide variety of Android devices, but because there are so many devices on the market, it is inevitable that users occasionally run into problems. In the Tips, tricks, and known issues document, we provide troubleshooting suggestions for issues we encountered during development testing.

If you’d like to learn more:

We offer a full online BatteryXPRT training course that covers almost every aspect of the benchmark. You can view the sections in order or jump to the parts that interest you. We guarantee that you’ll learn something new!

If you’d like to dig into the details:

Check out the Exploring BatteryXPRT 2014 for Android white paper. In it, we discuss the app’s development and structure. We also describe the component tests; explain the differences between the test’s Airplane, Wi-Fi, and Cellular modes; and detail the statistical processes we use to calculate expected battery life.

If you’d like to dig even deeper, the BatteryXPRT source code is available to members of the BenchmarkXPRT Development Community, so consider joining today. Membership is free for members of any company or organization with an interest in benchmarks, and there are no obligations after joining.

If you haven’t used BatteryXPRT before, try it out and let us know what you think!

Justin

Posted in Android, Battery life, BatteryXPRT 2014 for Android, Benchmark metrics, Benchmarking, BenchmarkXPRT development community, CrXPRT, TouchXPRT, What makes a good benchmark? |

Category: Benchmark metrics

Decisions, decisions

Introducing the XPRT Selector

Best practices

Apples and pears vs. oranges and bananas

Learning something new every day

BatteryXPRT: A quick and reliable way to estimate Android battery life

If you’d like to run BatteryXPRT:

If you’d like to learn more:

If you’d like to dig into the details:

Check out the other XPRTs: