BenchmarkXPRT Blog banner

Category: Performance testing on tablets

More on the way for the XPRT Weekly Tech Spotlight

In the coming months, we’ll continue to add more devices and helpful features to the XPRT Weekly Tech Spotlight. We’re especially interested in adding data points and visual aids that make it easier to quickly understand the context of each device’s test scores. For instance, those of us who are familiar with WebXPRT 3 scores know that an overall score of 250 is pretty high, but site visitors who are unfamiliar with WebXPRT probably won’t know how that score compares to scores for other devices.

We designed Spotlight to be a source of objective data, in contrast to sites that provide subjective ratings for devices. As we pursue our goal of helping users make sense of scores, we want to maintain this objectivity and avoid presenting information in ways that could be misleading.

Introducing comparison aids to the site is forcing us to make some tricky decisions. Because we value input from XPRT community members, we’d love to hear your thoughts on one of the questions we’re facing: How should our default view present a device’s score?

We see three options:

1) Present the device’s score in relation to the overall high and low scores for that benchmark across all devices.
2) Present the device’s score in relation to the overall high and low scores for that benchmark across the broad category of devices to which that device belongs (e.g., phones).
3) Present the device’s score in relation to the overall high and low scores for that benchmark across a narrower sub-category of devices to which that device belongs (e.g., high-end flagship phones).

To think this through, consider WebXPRT, which runs on desktops, laptops, phones, tablets, and other devices. Typically, the WebXPRT scores for phones and tablets are lower than scores for desktop and laptop systems. The first approach helps to show just how fast high-end desktops and laptops handle the WebXPRT workloads, but it could make a phone or tablet look slow, even if its score was good for its category. The second approach would prevent unfair default comparisons between different device types but would still present comparisons between devices that are not true competitors (e.g., flagship phones vs. budget phones). The third approach is the most careful, but would introduce an element of subjectivity because determining the sub-category in which a device belongs is not always clear cut.

Do you have thoughts on this subject, or recommendations for Spotlight in general? If so, Let us know.

Justin

Here’s to 100 more!

This week’s Essential Phone entry marks the 100th device that we’ve featured in the XPRT Weekly Tech Spotlight! It’s a notable milestone for us as we work toward our goal of building a substantial library of device information that buyers can use to compare devices. In celebration, I thought it would be fun to share some Spotlight-related stats.

Our first Spotlight entry was the Google Pixel C way back on February 8, 2016, and we’ve featured a wide array of devices since then:

  • 33 phones
  • 16 laptops
  • 16 tablets
  • 16 2-in-1s
  • 6 small-form-factor PCs
  • 5 desktops
  • 5 game consoles
  • 3 all-in-ones



In addition to a wide variety of device types, we try to include a wide range of vendors. So far, we’ve featured devices from Acer, Alcatel, Alienware, Amazon, Apple, ASUS, BLU, CHUWI, Dell, Essential, Fujitsu, Google, HP, HTC, Huawei, Intel, LeEco, Lenovo, LG, Microsoft, NVIDIA, OnePlus, Razer, Samsung, Sony, Syber, Xiaomi, and ZTE. We look forward to adding many more to that list during the year ahead.

XPRT Spotlight is a great way for device vendors and manufacturers to share PT-verified specs and test results with buyers around the world. If you’re interested in sending in a device for testing, please contact XPRTSpotlight@PrincipledTechnologies.com.

There’s a lot more to come for XPRT Spotlight, and we’re constantly working on new features and improvements for the page. Are there any specific devices or features that you would like to see in the Spotlight? Let us know.

Justin

Best practices

Recently, a tester wrote in and asked for help determining why they were seeing different WebXPRT scores on two tablets with the same hardware configuration. The scores differed by approximately 7.5 percent. This can happen for many reasons, including different software stacks, but score variability can also result from different testing behavior and environments. While some degree of variability is natural, the question provides us with a great opportunity to talk about the basic benchmarking practices we follow in the XPRT lab, practices that contribute to the most consistent and reliable scores.

Below, we list a few basic best practices you might find useful in your testing. While they’re largely in the context of the WebXPRT focus on evaluating browser performance, several of these practices apply to other benchmarks as well.

  • Test with clean images: We use an out-of-box (OOB) method for testing XPRT Spotlight devices. OOB testing means that other than initial OS and browser version updates that users are likely to run after first turning on the device, we change as little as possible before testing. We want to assess the performance that buyers are likely to see when they first purchase the device, before installing additional apps and utilities. This is the best way to provide an accurate assessment of the performance retail buyers will experience. While OOB is not appropriate for certain types of testing, the key is to not test a device that’s bogged down with programs that influence results unnecessarily.
  • Turn off updates: We do our best to eliminate or minimize app and system updates after initial setup. Some vendors are making it more difficult to turn off updates completely, but you should always account for update settings.
  • Get a feel for system processes: Depending on the system and the OS, quite a lot of system-level activity can be going on in the background after you turn it on. As much as possible, we like to wait for a stable baseline (idle) of system activity before kicking off a test. If we start testing immediately after booting the system, we often see higher variability in the first run before the scores start to tighten up.
  • Disclosure is not just about hardware: Most people know that different browsers will produce different performance scores on the same system. However, testers aren’t always aware of shifts in performance between different versions of the same browser. While most updates don’t have a large impact on performance, a few updates have increased (or even decreased) browser performance by a significant amount. For this reason, it’s always worthwhile to record and disclose the extended browser version number for each test run. The same principle applies to any other relevant software.
  • Use more than one data point: Because of natural variability, our standard practice in the XPRT lab is to publish a score that represents the median from at least three to five runs. If you run a benchmark only once, and the score differs significantly from other published scores, your result could be an outlier that you would not see again under stable testing conditions.


We hope those tips will make testing a little easier for you. If you have any questions about the XPRTs, or about benchmarking in general, feel free to ask!

Justin

Getting to know TouchXPRT

Many of our community members first encountered the XPRTs when reading about WebXPRT or MobileXPRT in a device review, using TouchXPRT or HDXPRT in an OEM lab, or using BatteryXPRT or CrXPRT to evaluate devices for bulk purchasing on behalf of a corporation or government agency. They know that specific XPRT provided great value in that context, but may not know about the other members of the XPRT family.

To help keep folks up to date on the full extent of XPRT capabilities, we like to occasionally “reintroduce” each of the XPRTs. This week, we invite you to get to know TouchXPRT.

We developed TouchXPRT 2016 as a Universal Windows Platform app for Windows 10. We wanted to offer a free tool that would provide consumers with objective information about how well a Windows 10 or Windows 10 Mobile laptop, tablet, or phone handles common media tasks. To do this, TouchXPRT runs five tests that simulate the kinds of photo, video, and music editing tasks people do every day. It measures how quickly the device completes each of those tasks and provides an overall score. To compare device scores, go to TouchXPRT.com and click View Results, where you’ll find scores from many different Windows 10 and Windows 10 Mobile devices.

TouchXPRT is easy to install and run, and is a great resource for anyone who wants to evaluate the performance of a Windows 10 device.

If you’d like to run TouchXPRT:

Simply download TouchXPRT from the Microsoft Store. (If that doesn’t work for you, you can also download it directly from TouchXPRT.com.) Installing it should take about 15 minutes, and the TouchXPRT 2016 release notes provide step-by-step instructions.

If you’d like to dig into the details:

Check out the Exploring TouchXPRT 2016 white paper. In it, we discuss the TouchXPRT development process, its component tests and workloads, and how it calculates individual workload and overall scores. We also provide instructions for automated testing.

BenchmarkXPRT Development Community members also have access to the TouchXPRT source code, so consider joining today. There’s no obligation and membership is free for members of any company or organization with an interest in benchmarks.

If you haven’t tried running TouchXPRT before, give it a shot and let us know what you think!

Justin

A clarification from Brett Howse

A couple of weeks ago, I described a conversation I had with Brett Howse of AnandTech. Brett was kind enough to send a clarification of some of his remarks, which he gave us permission to share with you.

“We are at a point in time where the technology that’s been called mobile since its inception is now at a point where it makes sense to compare it to the PC. However we struggle with the comparisons because the tools used to do the testing do not always perform the same workloads. This can be a major issue when a company uses a mobile workload, and a desktop workload, but then puts the resulting scores side by side, which can lead to misinformed conclusions. This is not only a CPU issue either, since on the graphics side we have OpenGL well established, along with DirectX, in the PC space, but our mobile workloads tend to rely on OpenGL ES, with less precision asked of the GPU, and GPUs designed around this. Getting two devices to run the same work is a major challenge, but one that has people asking what the results would be.”

I really appreciate Brett taking the time to respond. What are your thoughts in these issues? Please let us know!

Eric

Comparing apples and oranges?

My first day at CES, I had breakfast with Brett Howse from AnandTech. It was a great opportunity to get the perspective of a savvy tech journalist and frequent user of the XPRTs.

During our conversation, Brett raised concerns about comparing mobile devices to PCs. As mobile devices get more powerful, the performance and capability gaps between them and PCs are narrowing. That makes it more common to compare upper-end mobile devices to PCs.

People have long used different versions of benchmarks when comparing these two classes of devices. For example, the images for benchmarking a phone might be smaller than those for benchmarking a PC. Also, because of processor differences, the benchmarks might be built differently, say a 16- or 32-bit executable for a mobile device, and a 64-bit version for a PC. That was fine when no one was comparing the devices directly, but can be a problem now.

This issue is more complicated than it sounds. For those cases where a benchmark uses a dumbed-down version of the workload for mobile devices, comparing the results is clearly not valid. However, let’s assume that the workload stays the same, and that you run a 32-bit benchmark on a tablet, and a 64-bit version on a PC. Is the comparison valid? It may be, if you are talking about the day-to-day performance a user is likely to encounter. However, it may not be valid if you are making statement about the potential performance of the device itself.

Brett would like the benchmarking community to take charge of this issue and provide guidance about how to compare mobile devices and PCs. What are your thoughts?

Eric

Check out the other XPRTs:

Forgot your password?