BenchmarkXPRT Blog banner

Category: Benchmarking

Learning about machine learning

Everywhere we look, machine learning is in the news. It’s driving cars and beating the world’s best Go players. Whether we are aware of it or not, it’s in our lives–understanding our voices and identifying our pictures.

Our goal of being able to measure the performance of hardware and software that does machine learning seems more relevant than ever. Our challenge is to scan the vast landscape that is machine learning, and identify which elements to measure first.

There is a natural temptation to see machine learning as being all about neural networks such as AlexNet and GoogLeNet. However, new innovations appear all the time and lots of important work with more classic machine learning techniques is also underway. (Classic machine learning being anything more than a few years old!) Recursive neural networks used for language translation, reinforcement learning used in robotics, and support vector machine (SVM) learning used in text recognition are just a few examples among the wide array of algorithms to consider.

Creating a benchmark or set of benchmarks to cover all those areas, however, is unlikely to be possible. Certainly, creating such an ambitious tool would take so long that it would be of limited usefulness.

Our current thinking is to begin with a small set of representative algorithms. The challenge, of course, is identifying them. That’s where you come in. What would you like to start with?

We anticipate that the benchmark will focus on the types of inference learning and light training that are likely to occur on edge devices. Extensive training with large datasets takes place in data centers or on systems with extraordinary computing capabilities. We’re interested in use cases that will stress the local processing power of everyday devices.

We are, of course, reaching out to folks in the machine learning field—including those in academia, those who create the underlying hardware and software, and those who make the products that rely on that hardware and software.

What do you think?

Bill

Evolve or die

Last week, Google announced that it would retire its Octane benchmark. Their announcement explains that they designed Octane to spur improvement in JavaScript performance, and while it did just that when it was first released, those improvements have plateaued in recent years. They also note that there are some operations in Octane that optimize Octane scores but do not reflect real-world scenarios. That’s unfortunate, because they, like most of us, want improvements in benchmark scores to mean improvements in end-user experience.

WebXPRT comes at the web performance issue differently. While Octane’s goal was to improve JavaScript performance, the purpose of WebXPRT is to measure performance from the end user’s perspective. By doing the types of work real people do, WebXPRT doesn’t measure only improvements in JavaScript performance; it also measures the quality of the real-world user experience. WebXPRT’s results also reflect the performance of the entire device and software stack, not just the performance of the JavaScript interpreter.

Google’s announcement reminds us that benchmarks have finite life spans, that they must constantly evolve to keep pace with changes in technology, or they will become useless. To make sure the XPRT benchmarks do just that, we are always looking at how people use their devices and developing workloads that reflect their actions. This is a core element of the XPRT philosophy.

As we mentioned last week, we’ve working on the next version of WebXPRT. If you have any thoughts about how it should evolve, let us know!

Eric

Thinking ahead to WebXPRT 2017

A few months ago, Bill discussed our intention to update WebXPRT this year. Today, we want to share some initial ideas for WebXPRT 2017 and ask for your input.

Updates to the workloads provide an opportunity to increase the relevance and value of WebXPRT in the years to come. Here are a few of the ideas we’re considering:

  • For the Photo Enhancement workload, we can increase the data sizes of pictures. We can also experiment with additional types of photo enhancement such as background/foreground subtraction, collage creation, or panoramic/360-degree image viewing.
  • For the Organize Album workload, we can explore machine learning workloads by incorporating open source JavaScript libraries into web-based inferencing tests.
  • For the Local Notes workload, we’re investigating the possibility of leveraging natural-brain libraries for language processing functions.
  • For a new workload, we’re investigating the possibility of using online 3D modeling applications such as Tinkercad.

 
For the UI, we’re considering improvements to features like the in-test progress bars and individual subtest selection. We’re also planning to update the UI to make it visually distinct from older versions.

Throughout this process, we want to be careful to maintain the features that have made WebXPRT our most popular tool, with more than 141,000 runs to date. We’re committed to making sure that it runs quickly and simply in most browsers and produces results that are useful for comparing web browsing performance across a wide variety of devices.

Do you have feedback on these ideas or suggestions for browser technologies or test scenarios that we should consider for WebXPRT 2017? Are there existing features we should ditch? Are there elements of the UI that you find especially useful or would like to see improved? Please let us know. We want to hear from you and make sure that we’re crafting a performance tool that continues to meet your needs.

Justin

Looking under the hood

In the next couple of weeks, we’ll publish the source code and build instructions for the latest HDXPRT 2014 and BatteryXPRT 2014 builds. Access to XPRT source code is one of the benefits of BenchmarkXPRT Development Community membership. For readers who may not know, this a good time to revisit the reasons we make the source code available.

The primary reason is transparency; we want the XPRTs to be as open as possible. As part of our community model for software development, the source code is available to anyone who joins the community. Closed-source benchmark development can lead some people to infer that a benchmark is biased in some way. Our approach makes it impossible to hide any biases.

Another reason we publish source code is to encourage collaborative development and innovation. Community members are involved in XPRT development from the beginning, helping to identify emerging technologies in need of reliable benchmarking tools, suggesting potential workloads and improvements, reviewing design documents, and offering all sorts of general feedback.

Simply put, if you’re interested in benchmarking and the BenchmarkXPRT Development Community, then we’re interested in what you have to say! Community input helps us at every step of the process, and ultimately helps us to create benchmarking tools that are as reliable and relevant as possible.

If you’d like to review XPRT source code, but haven’t yet joined the community, we encourage you to go ahead and join! It’s easy, and if you work for a company or organization with an interest in benchmarking, you can join the community for free. Simply fill out the form with your company e-mail address and click the option to be considered for a free membership. We’ll contact you to verify the address is real and then activate your membership.

If you have any other questions about community membership or XPRT source code, feel free to contact us. We look forward to hearing from you!

Justin

Digging deeper

From time to time, we like to revisit the fundamentals of the XPRT approach to benchmark development. Today, we’re discussing the need for testers and benchmark developers to consider the multiple factors that influence benchmark results. For every device we test, all of its hardware and software components have the potential to affect performance, and changing the configuration of those components can significantly change results.

For example, we frequently see significant performance differences between different browsers on the same system. In our recent recap of the XPRT Weekly Tech Spotlight’s first year, we highlighted an example of how testing the same device with the same benchmark can produce different results, depending on the software stack under test. In that instance, the Alienware Steam Machine entry included a WebXPRT 2015 score for each of the two browsers that consumers were likely to use. The first score (356) represented the SteamOS browser app in the SteamOS environment, and the second (441) represented the Iceweasel browser (a Firefox variant) in the Linux-based desktop environment. Including only the first score would have given readers an incomplete picture of the Steam Machine’s web-browsing capabilities, so we thought it was important to include both.

We also see performance differences between different versions of the same browser, a fact especially relevant to those who use frequently updated browsers, such as Chrome. Even benchmarks that measure the same general area of performance, for example, web browsing, are usually testing very different things.

OS updates can also have an impact on performance. Consumers might base a purchase on performance or battery life scores and end up with a device that behaves much differently when updated to a new version of Android or iOS, for example.

Other important factors in the software stack include pre-installed software, commonly referred to as bloatware, and the proliferation of apps that sap performance and battery life.

This is a much larger topic than we can cover in the blog. Let the examples we’ve mentioned remind you to think critically about, and dig deeper into, benchmark results. If we see published XPRT scores that differ significantly from our own results, our first question is always “What’s different between the two devices?” Most of the time, the answer becomes clear as we compare hardware and software from top to bottom.

Justin

Celebrating one year of the XPRT Weekly Tech Spotlight

It’s been just over a year since we launched the XPRT Weekly Tech Spotlight by featuring our first device, the Google Pixel C. Spotlight has since become one of the most popular items at BenchmarkXPRT.com, and we thought now would be a good time to recap the past year, offer more insight into the choices we make behind the scenes, and look at what’s ahead for Spotlight.

The goal of Spotlight is to provide PT-verified specs and test results that can help consumers make smart buying decisions. We try to include a wide variety of device types, vendors, software platforms, and price points in our inventory. The devices also tend to fall into one of two main groups: popular new devices generating a lot of interest and devices that have unique form factors or unusual features.

To date, we’ve featured 56 devices: 16 phones, 11 laptops, 10 two-in-ones, 9 tablets, 4 consoles, 3 all-in-ones, and 3 small-form-factor PCs. The operating systems these devices run include Android, ChromeOS, iOS, macOS, OS X, Windows, and an array of vendor-specific OS variants and skins.

As much as possible, we test using out-of-the-box (OOB) configurations. We want to present test results that reflect what everyday users will experience on day one. Depending on the vendor, the OOB approach can mean that some devices arrive bogged down with bloatware while others are relatively clean. We don’t attempt to “fix” anything in those situations; we simply test each device “as is” when it arrives.

If devices arrive with outdated OS versions (as is often the case with Chromebooks), we update to current versions before testing, because that’s the best reflection of what everyday users will experience. In the past, that approach would’ve been more complicated with Windows systems, but the Microsoft shift to “Windows as a service” ensures that most users receive significant OS updates automatically by default.

The OOB approach also means that the WebXPRT scores we publish reflect the performance of each device’s default browser, even if it’s possible to install a faster browser. Our goal isn’t to perform a browser shootout on each device, but to give an accurate snapshot of OOB performance. For instance, last week’s Alienware Steam Machine entry included two WebXPRT scores, a 356 on the SteamOS browser app and a 441 on Iceweasel 38.8.0 (a Firefox variant used in the device’s Linux-based desktop mode). That’s a significant difference, but the main question for us was which browser was more likely to be used in an OOB scenario. With the Steam Machine, the answer was truly “either one.” Many users will use the browser app in the SteamOS environment and many will take the few steps needed to access the desktop environment. In that case, even though one browser was significantly faster than the other, choosing to omit one score in favor of the other would have excluded results from an equally likely OOB environment.

We’re always looking for ways to improve Spotlight. We recently began including more photos for each device, including ones that highlight important form-factor elements and unusual features. Moving forward, we plan to expand Spotlight’s offerings to include automatic score comparisons, additional system information, and improved graphical elements. Most importantly, we’d like to hear your thoughts about Spotlight. What devices and device types would you like to see? Are there specs that would be helpful to you? What can we do to improve Spotlight? Let us know!

Justin

Check out the other XPRTs:

Forgot your password?