PT-Logo
Forgot your password?
BenchmarkXPRT Blog banner

Category: What makes a good benchmark?

Transparent goals

Recently, Forbes published an article discussing a new report on phone battery life from Which?, a UK consumer advocacy group. In the report, Which? states that they tested the talk time battery life of 50 phones from five brands. During the tests, phones from three of the brands lasted longer than the manufacturers’ claims, while phones from another brand underperformed by about five percent. The fifth brand’s published battery life numbers were 18 to 51 percent higher than Which? recorded in their tests.

Folks can read the article for more details about the tests and the brands. While the report raises some interesting questions, and the article provides readers with brief test methodology descriptions from Which? and one manufacturer, we don’t know enough about the tests to say which set of claims is correct. Any number of variables related to test workloads or device configuration settings could significantly affect the results. Both parties may be using sound benchmarking principles in good faith, but their test methodologies may not be comparable. As it is, we simply don’t have enough information to evaluate the study.

Whether the issue is battery life or any other important device spec, information conflicts, such as the one that the Forbes article highlights, can leave consumers scratching their heads, trying to decide which sources are worth listening to. At the XPRTs, we believe that the best remedy for this type of problem is to provide complete transparency into our testing methodologies and development process. That’s why our lab techs verify all the hardware specs for each XPRT Weekly Tech Spotlight entry. It’s why we publish white papers explaining the structure of our benchmarks in detail, as well as how the XPRTs calculate performance results. It’s also why we employ an open development community model and make each XPRT’s source code available to community members. When we’re open about how we do things, it encourages the kind of honest dialogue between vendors, journalists, consumers, and community members that serves everyone’s best interests.

If you love tech and share that same commitment to transparency, we’d love for you to join our community, where you can access XPRT source code and previews of upcoming benchmarks. Membership is free for anyone with a verifiable corporate affiliation. If you have any questions about membership or the registration process, please feel free to ask.

Justin

BatteryXPRT provides the objective battery life data that shoppers need

Over the last few weeks, we’ve discussed the capabilities and benefits of TouchXPRT and CrXPRT. This week, we’d like to reintroduce readers to BatteryXPRT, our app that evaluates the battery life and performance of Android devices.

Battery life for phones and tablets has improved dramatically over the last several years, to the point where many devices can support continuous use for well over a full work day on a single charge. This improvement is the result of advances in battery hardware technology, increased processor efficiency, and smarter utilization of software services by the operating system. Battery life has increased to some extent for most device categories and price points. However, enough of a range remains between devices at each level that access to objective battery life data is valuable for device shoppers.

Without BatteryXPRT, shoppers must rely on manufacturer estimates or full rundown tests that don’t resemble the types of things we do with our phones and tablets every day. A rundown test that surfs the web continuously for over 15 hours reveals which devices last the longest performing that specific task. It doesn’t tell you which devices last the longest over a full day performing a variety of common activities such as web browsing, watching videos, browsing and editing photos, playing music, and periodically sleeping. During BatteryXPRT’s battery life test, the app executes those same types of tasks and produces a performance score based on the speed with which a device completes each task.

BatteryXPRT provides an intuitive user interface in English and Simplified Chinese, and easy-to-understand results for both battery life and performance. Because your data connection can have a significant effect on battery life, BatteryXPRT runs in airplane mode, connected to the Internet via Wi-Fi, or connected to the Internet through a cellular data connection.

BatteryXPRT is easy to install and run, and is a great resource for anyone who wants to evaluate how well an Android device will meet their needs. If you’d like to see test results from a variety of Android devices, go to BatteryXPRT.com and click View Results, where you’ll find scores from many different Android devices.

If you’d like to run BatteryXPRT

Simply download BatteryXPRT from the Google Play store or BatteryXPRT.com. The BatteryXPRT installation instructions and user manual provide step-by-step instructions for configuring your device and kicking off a test. We designed BatteryXPRT to be compatible with a wide variety of Android devices, but because there are so many devices on the market, it is inevitable that users occasionally run into problems. In the Tips, tricks, and known issues document, we provide troubleshooting suggestions for issues we encountered during development testing.

If you’d like to learn more

The Exploring BatteryXPRT 2014 for Android white paper covers almost every aspect of the benchmark. In it, we explain the guiding concepts behind BatteryXPRT’s development, as well as the benchmark’s structure. We describe the component tests, the differences between the app’s Airplane and Network/Wi-Fi modes, and the statistical processes used to calculate expected battery life.

Justin

CrXPRT is more valuable than ever

Digital Trends recently published an article discussing various rumors about the future of the Google Pixelbook line. Pixelbooks were some of the first Chromebooks with high-end hardware specs, and they were priced accordingly. Whether or not the rumors discussed in the article turn out to be true, the author points out that the Pixelbook prompted several other vendors, such as HP and Lenovo, to take a chance on high-end Chromebooks. It seems like high-end Chromebooks are here to stay, but given the unique constraints of the Chrome OS environment, buyers are often unsure if it’s worth it to shell out the extra money for a premium model.

We developed CrXPRT to help buyers answer these questions. CrXPRT is a benchmark tool that measures the battery life of your Chromebook as well as how fast it handles everyday tasks like playing video games, watching movies, editing pictures, and doing homework. The performance test gives you individual workload scores and an overall score based on the speed of the device. The battery life test produces an estimated battery life time, a separate performance score, and a frames-per-second (FPS) rate for a built-in HTML5 gaming component.

You don’t have to be a tech journalist or even a techie to use CrXPRT. To learn more, check out the links below.

Testing the performance or battery life of your Chromebook

Simply download CrXPRT from the Chrome Web Store. Installation is quick and easy, and the CrXPRT 2015 user manual provides step-by-step instructions. A typical performance test takes about 15 minutes, and a battery life test will take 3.5 hours once the system is charged and configured for testing. If you’d like to see how your score compares to other Chromebooks, visit the CrXPRT results page.

Want to know more?

Read the Exploring CrXPRT 2015 white paper, where we discuss the concepts behind CrXPRT, its development process, and the app’s structure. We also describe the component tests and explain the statistical processes used to calculate expected battery life.

BenchmarkXPRT Development Community members also have access to the CrXPRT source code, so if you’re interested, join today! There’s no obligation and membership is free for members of any company or organization with an interest in benchmarks.

Give CrXPRT a try and let us know what you think!

Justin

The AIXPRT Request for Comments preview build

In the next few days, we’ll be publishing the first AIXPRT tool as a Request for Comments (RFC) preview build, an early version of one of the AIXPRT tools we’re developing to help evaluate machine learning performance.

We’re inviting folks to run the workload and send in their thoughts and suggestions. Only BenchmarkXPRT Development Community members have access to our RFCs and the opportunity to provide feedback. However, because we’re seeking broad input from experts in this field, we’ll gladly make anyone interested in participating a member.

This AIXPRT RFC preview build includes support for the Intel OpenVINO computer vision toolkit to run image classification workloads with ResNet-50 and SSD-MobileNet v1 networks. The test reports FP32 and FP16 levels of precision. The system requirements are:

  • Operating system = Ubuntu 16.04
  • CPU = 6th to 8th generation Intel Core or Xeon processors, or Intel Pentium processors N4200/5, N3350/5, N3450/5 with Intel HD Graphics


We welcome input on all aspects of the benchmark, including scope, workloads, metrics and scores, user experience, and reporting. We will add support for TensorFlow and TensorRT to the AIXPRT RFC preview build during the preview period. We are accepting feedback through January 25th, 2019, after which we’ll collect and evaluate responses before publishing the next build. Because this is an RFC release, we ask that testers do not publish scores or use the results for comparison purposes.

We’ll send out a community announcement when the RFC preview build is officially available, and we’ll also post an announcement and RFC preview build user guide on AIXPRT.com. We’re hosting the AIXPRT RFC preview build in a dedicated GitHub repository, so please contact us at BenchmarkXPRTsupport@principledtechnologies.com to gain access.

This is just the next step for AIXPRT. With your help, we hope to add more workloads and other frameworks in the coming months. We look forward to receiving your feedback!

Bill

Notes from the lab: choosing a calibration system for MobileXPRT 3

Last week, we shared some details about what to expect in MobileXPRT 3. This week, we want to provide some insight into one part of the MobileXPRT development process, choosing a calibration system.

First, some background. For each of the benchmarks in the XPRT family, we select a calibration system using criteria we’ll explain below. This system serves as a reference point, and we use it to calculate scores that will help users understand a single benchmark result. The calibration system for MobileXPRT 2015 is the Motorola DROID RAZR M. We structured our calculation process so that the mean performance score from repeated MobileXPRT 2015 runs on that device is 100. A device that completes the same workloads 20 percent faster than the DROID RAZR M would have a performance score of 120, and one that performs the test 20 percent more slowly would have a score of 80. (You can find a more in-depth explanation of MobileXPRT score calculations in the Exploring MobileXPRT 2015 white paper.)

When selecting a calibration device, we are looking for a relevant reference point in today’s market. The device should be neither too slow to handle modern workloads nor so fast that it outscores most devices on the market. It should represent a level of performance that is close to what the majority of consumers experience, and one that will continue to be relevant for some time. This approach helps to build context for the meaning of the benchmark’s overall score. Without that context, testers can’t tell whether a score is fast or slow just by looking at the raw number. When compared to a well-known standard such as the calibration device, however, the score has more informative value.

To determine a suitable calibration device for MobileXPRT 3, we started by researching the most popular Android phones by market share around the world. It soon became clear that in many major markets, the Samsung Galaxy S8 ranked first or second, or at least appeared in the top five. As last year’s first Samsung flagship, the S8 is no longer on the cutting edge, but it has specs that many current mid-range phones are deploying, and the hardware should remain relevant for a couple of years.

For all of these reasons, we made the Samsung Galaxy S8 the calibration device for MobileXPRT 3. The model in our lab has a Qualcomm Snapdragon 835 SoC, 4 GB of RAM, and runs Android 7.0 (Nougat). We think it has the balance we’re looking for.

If you have any questions or concerns about MobileXPRT 3, calibration devices, or score calculations, please let us know. We look forward to sharing more information about MobileXPRT 3 as we get closer to the community preview.

Justin

Which browser is the fastest? It’s complicated.

PCWorld recently published the results of a head-to-head browser performance comparison between Google Chrome, Microsoft Edge, Mozilla Firefox, and Opera. As we’ve noted about similar comparisons, no single browser was the fastest in every test. Browser speed sounds like a straightforward metric, but the reality is complex.

For the comparison, PCWorld used three JavaScript-centric test suites (JetStream, SunSpider, and Octane), one benchmark that simulates user actions (Speedometer), a few in-house tests of their own design, and one benchmark that simulates real-world web applications (WebXPRT). Edge came out on top in JetStream and SunSpider, Opera won in Octane and WebXPRT, and Chrome had the best results in Speedometer and PCWorld’s custom workloads.

The reason that the benchmarks rank the browsers so differently is that each one has a unique emphasis and tests a specific set of workloads and technologies. Some focus on very low-level JavaScript tasks, some test additional technologies such as HTML5, and some are designed to identify strengths or weakness by stressing devices in unusual ways. These approaches are all valid, and it’s important to understand exactly what a given score represents. Some scores reflect a very broad set of metrics, while others assess a very narrow set of tasks. Some scores help you to understand the performance you can expect from a device in your everyday life, and others measure performance in scenarios that you’re unlikely to encounter. For example, when Eric discussed a similar topic in the past, he said the tests in JetStream 1.1 provided information that “can be very useful for engineers and developers, but may not be as meaningful to the typical user.”

As we do with all the XPRTs, we designed WebXPRT to test how devices handle the types of real-world tasks consumers perform every day. While lab techs, manufacturers, and tech journalists can all glean detailed data from WebXPRT, the test’s real-world focus means that the overall score is relevant to the average consumer. Simply put, a device with a higher WebXPRT score is probably going to feel faster to you during daily use than one with a lower score. In today’s crowded tech marketplace, that piece of information provides a great deal of value to many people.

What are your thoughts on browser testing? We’d love to hear from you.

Justin

Check out the other XPRTs: