BenchmarkXPRT Blog banner

Category: Benchmark metrics

Understanding AIXPRT results

Last week, we discussed the changes we made to the AIXPRT Community Preview 2 (CP2) download page as part of our ongoing effort to make AIXPRT easier to use. This week, we want to discuss the basics of understanding AIXPRT results by talking about the numbers that really matter and how to access and read the actual results files.

To understand AIXPRT results at a high level, it’s important to revisit the core purpose of the benchmark. AIXPRT’s bundled toolkits measure inference latency (the speed of image processing) and throughput (the number of images processed in a given time period) for image recognition (ResNet-50) and object detection (SSD-MobileNet v1) tasks. Testers have the option of adjusting variables such as batch size (the number of input samples to process simultaneously) to try and achieve higher levels of throughput, but higher throughput can come at the expense of increased latency per task. In real-time or near real-time use cases such as performing image recognition on individual photos being captured by a camera, lower latency is important because it improves the user experience. In other cases, such as performing image recognition on a large library of photos, achieving higher throughput might be preferable; designating larger batch sizes or running concurrent instances might allow the overall workload to complete more quickly.

The dynamics of these performance tradeoffs ensure that there is no single good score for all machine learning scenarios. Some testers might prefer lower latency, while others would sacrifice latency to achieve the higher level of throughput that their use case demands.

Testers can find latency and throughput numbers for each completed run in a JSON results file in the AIXPRT/Results folder. The test also generates CSV results files that are in the same folder. The raw results files report values for each AI task configuration (e.g., ResNet-50, Batch1, on CPU). Parsing and consolidating the raw data can take some time, so we’re developing a results file parsing tool to make the job much easier.

The results parsing tool is currently available in the AIXPRT CP2 OpenVINO – Windows package, and we hope to make it available for more packages soon. Using the tool is as simple as running a single command, and detailed instructions for how to do so are in the AIXPRT OpenVINO on Windows user guide. The tool produces a summary (example below) that makes it easier to quickly identify relevant comparison points such as maximum throughput and minimum latency.

AIXPRT results summary

In addition to the summary, the tool displays the throughput and latency results for each AI task configuration tested by the benchmark. AIXPRT runs each AI task multiple times and reports the average inference throughput and corresponding latency percentiles.

AIXPRT results details

We hope that this information helps to make it easier to understand AIXPRT results. If you have any questions or comments, please feel free to contact us.

Justin

Transparent goals

Recently, Forbes published an article discussing a new report on phone battery life from Which?, a UK consumer advocacy group. In the report, Which? states that they tested the talk time battery life of 50 phones from five brands. During the tests, phones from three of the brands lasted longer than the manufacturers’ claims, while phones from another brand underperformed by about five percent. The fifth brand’s published battery life numbers were 18 to 51 percent higher than Which? recorded in their tests.

Folks can read the article for more details about the tests and the brands. While the report raises some interesting questions, and the article provides readers with brief test methodology descriptions from Which? and one manufacturer, we don’t know enough about the tests to say which set of claims is correct. Any number of variables related to test workloads or device configuration settings could significantly affect the results. Both parties may be using sound benchmarking principles in good faith, but their test methodologies may not be comparable. As it is, we simply don’t have enough information to evaluate the study.

Whether the issue is battery life or any other important device spec, information conflicts, such as the one that the Forbes article highlights, can leave consumers scratching their heads, trying to decide which sources are worth listening to. At the XPRTs, we believe that the best remedy for this type of problem is to provide complete transparency into our testing methodologies and development process. That’s why our lab techs verify all the hardware specs for each XPRT Weekly Tech Spotlight entry. It’s why we publish white papers explaining the structure of our benchmarks in detail, as well as how the XPRTs calculate performance results. It’s also why we employ an open development community model and make each XPRT’s source code available to community members. When we’re open about how we do things, it encourages the kind of honest dialogue between vendors, journalists, consumers, and community members that serves everyone’s best interests.

If you love tech and share that same commitment to transparency, we’d love for you to join our community, where you can access XPRT source code and previews of upcoming benchmarks. Membership is free for anyone with a verifiable corporate affiliation. If you have any questions about membership or the registration process, please feel free to ask.

Justin

BatteryXPRT provides the objective battery life data that shoppers need

Over the last few weeks, we’ve discussed the capabilities and benefits of TouchXPRT and CrXPRT. This week, we’d like to reintroduce readers to BatteryXPRT, our app that evaluates the battery life and performance of Android devices.

Battery life for phones and tablets has improved dramatically over the last several years, to the point where many devices can support continuous use for well over a full work day on a single charge. This improvement is the result of advances in battery hardware technology, increased processor efficiency, and smarter utilization of software services by the operating system. Battery life has increased to some extent for most device categories and price points. However, enough of a range remains between devices at each level that access to objective battery life data is valuable for device shoppers.

Without BatteryXPRT, shoppers must rely on manufacturer estimates or full rundown tests that don’t resemble the types of things we do with our phones and tablets every day. A rundown test that surfs the web continuously for over 15 hours reveals which devices last the longest performing that specific task. It doesn’t tell you which devices last the longest over a full day performing a variety of common activities such as web browsing, watching videos, browsing and editing photos, playing music, and periodically sleeping. During BatteryXPRT’s battery life test, the app executes those same types of tasks and produces a performance score based on the speed with which a device completes each task.

BatteryXPRT provides an intuitive user interface in English and Simplified Chinese, and easy-to-understand results for both battery life and performance. Because your data connection can have a significant effect on battery life, BatteryXPRT runs in airplane mode, connected to the Internet via Wi-Fi, or connected to the Internet through a cellular data connection.

BatteryXPRT is easy to install and run, and is a great resource for anyone who wants to evaluate how well an Android device will meet their needs. If you’d like to see test results from a variety of Android devices, go to BatteryXPRT.com and click View Results, where you’ll find scores from many different Android devices.

If you’d like to run BatteryXPRT

Simply download BatteryXPRT from the Google Play store or BatteryXPRT.com. The BatteryXPRT installation instructions and user manual provide step-by-step instructions for configuring your device and kicking off a test. We designed BatteryXPRT to be compatible with a wide variety of Android devices, but because there are so many devices on the market, it is inevitable that users occasionally run into problems. In the Tips, tricks, and known issues document, we provide troubleshooting suggestions for issues we encountered during development testing.

If you’d like to learn more

The Exploring BatteryXPRT 2014 for Android white paper covers almost every aspect of the benchmark. In it, we explain the guiding concepts behind BatteryXPRT’s development, as well as the benchmark’s structure. We describe the component tests, the differences between the app’s Airplane and Network/Wi-Fi modes, and the statistical processes used to calculate expected battery life.

Justin

TouchXPRT: a great tool for evaluating Windows performance

From time to time, we remember that some XPRT users have experience with only one or two of the benchmark tools in our portfolio. They might have bookmarked a link to WebXPRT they found in a tech review or copied the HDXPRT installer package from a flash drive in their lab, but are unaware of other members of the XPRT family that could be useful to them. To spread the word on the range of capabilities the XPRTs offer, we occasionally highlight one of the XPRT tools in the blog . Last week, we discussed CrXPRT, a benchmark for evaluating the performance and battery life of Chrome OS devices. Today, we focus on TouchXPRT, our app for evaluating the performance of Windows 10 devices.

While our first benchmark, HDXPRT, is a great tool for assessing how well Windows machines handle media creation tasks using real commercial applications, it’s simply too large to run on most Windows tablets, 2-in-1s, and laptops with limited memory. To test those devices, we developed the latest version of TouchXPRT as a Universal Windows Platform app. As a Windows app, installing TouchXPRT is easy and quick (about 15 minutes). It runs five tests that simulate common photo, video, and music editing tasks; measures how quickly the device completes each of those tasks; and provides an overall score. It takes about 15 minutes to run on most devices. Labs can also automate testing using the command line or a script.

Want to run TouchXPRT?

Download TouchXPRT from the Microsoft Store or from TouchXPRT.com. The TouchXPRT 2016 release notes provide step-by-step instructions. To compare device scores, go to the TouchXPRT 2016 results page, where you’ll find scores from many Windows 10 devices.

Want to dig into the details?

Check out the Exploring TouchXPRT 2016 white paper. In it, we discuss the TouchXPRT development process, its component tests and workloads, and how it calculates individual workload and overall scores. We also provide instructions for automated testing.

BenchmarkXPRT Development Community members also have access to the TouchXPRT source code, so consider joining the community today. There’s no obligation and membership is free for members of any company or organization with an interest in benchmarks.

If you’ve been looking for a Windows performance evaluation tool that’s easy to use and has the flexibility of a UWP app, give TouchXPRT a try and let us know what you think!

Justin

All about the AIXPRT Community Preview

Last week, Bill discussed our plans for the AIXPRT Community Preview (CP). I’m happy to report that, despite some last-minute tweaks and testing, we’re close to being on schedule. We expect to take the CP build live in the coming days, and will send a message to community members to let them know when the build is available in the AIXPRT GitHub repository.

As we mentioned last week, the AIXPRT CP build includes support for the Intel OpenVINO, TensorFlow (CPU and GPU), and TensorFlow with NVIDIA TensorRT toolkits to run image-classification workloads with ResNet-50 and SSD-MobileNet v1 networks. The test reports FP32, FP16, and INT8 levels of precision. Although the minimum CPU and GPU requirements vary by toolkit, the test systems must be running Ubuntu 16.04 LTS. You’ll be able to find more detail on those requirements in the installation instructions that we’ll post on AIXPRT.com.

We’re making the AIXPRT CP available to anyone interested in participating, but you must have a GitHub account. To gain access to the CP, please contact us and let us know your GitHub username. Once we receive it, we’ll send you an invitation to join the repository as a collaborator.

We’re allowing folks to quote test results during the CP period, and we’ll publish results from our lab and other members of the community at AIXPRT.com. Because this testing involves so many complex variables, we may contact testers if we see published results that seem to be significantly different than those from comparable systems. During the CP period, On the AIXPRT results page, we’ll provide detailed instructions on how to send in your results for publication on our site. For each set of results we receive , we’ll disclose all of the detailed test, software, and hardware information that the tester provides. In doing so, our goal is to make it possible for others to reproduce the test and confirm that they get similar numbers.

If you make changes to the code during testing, we ask that you email us and describe those changes. We’ll evaluate if those changes should become part of AIXPRT. We also require that users do not publish results from modified versions of the code during the CP period.

We expect the AIXPRT CP period to last about four to six weeks, placing the public release around the end of March or beginning of April. In the meantime, we welcome your thoughts and suggestions about all aspects of the benchmark.

Please let us know if you have any questions. Stay tuned to AIXPRT.com and the blog for more developments, and we look forward to seeing your results!

JNG

Out with the old, and in with the new

What we now know as the BenchmarkXPRT Development Community started many years ago as the HDXPRT Development Community forum. At the time, the community was much smaller, and HDXPRT was our only benchmark. When a member wanted to run the benchmark, they submitted a request, and then received an installation DVD in the mail.

With hundreds of members, more than a half dozen active benchmarks, and the online availability of all our tools, the current community is a much different organization. Instead of the original forum, most of our interaction with members takes place through the blog, the monthly newsletter, direct email, and our social media accounts. Because of the way the community has changed, and because the original forum is no longer very active, we believe that the time and resources that we devote to maintaining the forum could be better spent on building and maintaining other community assets. To that end, we’ve decided to end support for the original BenchmarkXPRT forum.

As always, community members’ voices are an important consideration in what we do. If you have any questions or concerns about the decision to close down the original forum, please let us know as soon as possible.

On another note, we want to thank the community members who’ve participated in the HDXPRT 4 Community Preview. Testing has gone well, and we’re planning to release HDXPRT 4 to the public towards the end of next week!

Justin

Check out the other XPRTs:

Forgot your password?