Category: Performance testing on tablets

Notes from the lab: choosing a calibration system for MobileXPRT 3

on October 4, 2018

Last week, we shared some details about what to expect in MobileXPRT 3. This week, we want to provide some insight into one part of the MobileXPRT development process, choosing a calibration system.

First, some background. For each of the benchmarks in the XPRT family, we select a calibration system using criteria we’ll explain below. This system serves as a reference point, and we use it to calculate scores that will help users understand a single benchmark result. The calibration system for MobileXPRT 2015 is the Motorola DROID RAZR M. We structured our calculation process so that the mean performance score from repeated MobileXPRT 2015 runs on that device is 100. A device that completes the same workloads 20 percent faster than the DROID RAZR M would have a performance score of 120, and one that performs the test 20 percent more slowly would have a score of 80. (You can find a more in-depth explanation of MobileXPRT score calculations in the Exploring MobileXPRT 2015 white paper.)

When selecting a calibration device, we are looking for a relevant reference point in today’s market. The device should be neither too slow to handle modern workloads nor so fast that it outscores most devices on the market. It should represent a level of performance that is close to what the majority of consumers experience, and one that will continue to be relevant for some time. This approach helps to build context for the meaning of the benchmark’s overall score. Without that context, testers can’t tell whether a score is fast or slow just by looking at the raw number. When compared to a well-known standard such as the calibration device, however, the score has more informative value.

To determine a suitable calibration device for MobileXPRT 3, we started by researching the most popular Android phones by market share around the world. It soon became clear that in many major markets, the Samsung Galaxy S8 ranked first or second, or at least appeared in the top five. As last year’s first Samsung flagship, the S8 is no longer on the cutting edge, but it has specs that many current mid-range phones are deploying, and the hardware should remain relevant for a couple of years.

For all of these reasons, we made the Samsung Galaxy S8 the calibration device for MobileXPRT 3. The model in our lab has a Qualcomm Snapdragon 835 SoC, 4 GB of RAM, and runs Android 7.0 (Nougat). We think it has the balance we’re looking for.

If you have any questions or concerns about MobileXPRT 3, calibration devices, or score calculations, please let us know. We look forward to sharing more information about MobileXPRT 3 as we get closer to the community preview.

Justin

Posted in Benchmark metrics, BenchmarkXPRT, BenchmarkXPRT development community, Mobile devices, MobileXPRT, MobileXPRT 3, Performance benchmarking, Performance testing on tablets, What makes a good benchmark? | Tagged Android, benchmarks, BenchmarkXPRT, Calibration, Droid, Galaxy, mobile, MobileXPRT, Motorola, Performance, phones, Samsung |

More on the way for the XPRT Weekly Tech Spotlight

By Justin Greene

on May 24, 2018

In the coming months, we’ll continue to add more devices and helpful features to the XPRT Weekly Tech Spotlight. We’re especially interested in adding data points and visual aids that make it easier to quickly understand the context of each device’s test scores. For instance, those of us who are familiar with WebXPRT 3 scores know that an overall score of 250 is pretty high, but site visitors who are unfamiliar with WebXPRT probably won’t know how that score compares to scores for other devices.

We designed Spotlight to be a source of objective data, in contrast to sites that provide subjective ratings for devices. As we pursue our goal of helping users make sense of scores, we want to maintain this objectivity and avoid presenting information in ways that could be misleading.

Introducing comparison aids to the site is forcing us to make some tricky decisions. Because we value input from XPRT community members, we’d love to hear your thoughts on one of the questions we’re facing: How should our default view present a device’s score?

We see three options:

1) Present the device’s score in relation to the overall high and low scores for that benchmark across all devices.
2) Present the device’s score in relation to the overall high and low scores for that benchmark across the broad category of devices to which that device belongs (e.g., phones).
3) Present the device’s score in relation to the overall high and low scores for that benchmark across a narrower sub-category of devices to which that device belongs (e.g., high-end flagship phones).

To think this through, consider WebXPRT, which runs on desktops, laptops, phones, tablets, and other devices. Typically, the WebXPRT scores for phones and tablets are lower than scores for desktop and laptop systems. The first approach helps to show just how fast high-end desktops and laptops handle the WebXPRT workloads, but it could make a phone or tablet look slow, even if its score was good for its category. The second approach would prevent unfair default comparisons between different device types but would still present comparisons between devices that are not true competitors (e.g., flagship phones vs. budget phones). The third approach is the most careful, but would introduce an element of subjectivity because determining the sub-category in which a device belongs is not always clear cut.

Do you have thoughts on this subject, or recommendations for Spotlight in general? If so, Let us know.

Justin

Posted in Cross-platform benchmarks, Mobile devices, Performance of computing devices, Performance testing on tablets, WebXPRT, WebXPRT 3, XPRT Weekly Tech Spotlight | Tagged benchmark, desktops, laptops, phones, Spotlight, tablets, WebXPRT, XPRT Weekly Tech Spotlight |

Here’s to 100 more!

By Justin Greene

on January 18, 2018

This week’s Essential Phone entry marks the 100th device that we’ve featured in the XPRT Weekly Tech Spotlight! It’s a notable milestone for us as we work toward our goal of building a substantial library of device information that buyers can use to compare devices. In celebration, I thought it would be fun to share some Spotlight-related stats.

Our first Spotlight entry was the Google Pixel C way back on February 8, 2016, and we’ve featured a wide array of devices since then:

33 phones
16 laptops
16 tablets
16 2-in-1s
6 small-form-factor PCs
5 desktops
5 game consoles
3 all-in-ones

In addition to a wide variety of device types, we try to include a wide range of vendors. So far, we’ve featured devices from Acer, Alcatel, Alienware, Amazon, Apple, ASUS, BLU, CHUWI, Dell, Essential, Fujitsu, Google, HP, HTC, Huawei, Intel, LeEco, Lenovo, LG, Microsoft, NVIDIA, OnePlus, Razer, Samsung, Sony, Syber, Xiaomi, and ZTE. We look forward to adding many more to that list during the year ahead.

XPRT Spotlight is a great way for device vendors and manufacturers to share PT-verified specs and test results with buyers around the world. If you’re interested in sending in a device for testing, please contact XPRTSpotlight@PrincipledTechnologies.com.

There’s a lot more to come for XPRT Spotlight, and we’re constantly working on new features and improvements for the page. Are there any specific devices or features that you would like to see in the Spotlight? Let us know.

Justin

Posted in Performance benchmarking, Performance of computing devices, Performance testing on tablets, XPRT Weekly Tech Spotlight |

Best practices

By Justin Greene

on August 3, 2017

Recently, a tester wrote in and asked for help determining why they were seeing different WebXPRT scores on two tablets with the same hardware configuration. The scores differed by approximately 7.5 percent. This can happen for many reasons, including different software stacks, but score variability can also result from different testing behavior and environments. While some degree of variability is natural, the question provides us with a great opportunity to talk about the basic benchmarking practices we follow in the XPRT lab, practices that contribute to the most consistent and reliable scores.

Below, we list a few basic best practices you might find useful in your testing. While they’re largely in the context of the WebXPRT focus on evaluating browser performance, several of these practices apply to other benchmarks as well.

Test with clean images: We use an out-of-box (OOB) method for testing XPRT Spotlight devices. OOB testing means that other than initial OS and browser version updates that users are likely to run after first turning on the device, we change as little as possible before testing. We want to assess the performance that buyers are likely to see when they first purchase the device, before installing additional apps and utilities. This is the best way to provide an accurate assessment of the performance retail buyers will experience. While OOB is not appropriate for certain types of testing, the key is to not test a device that’s bogged down with programs that influence results unnecessarily.
Turn off updates: We do our best to eliminate or minimize app and system updates after initial setup. Some vendors are making it more difficult to turn off updates completely, but you should always account for update settings.
Get a feel for system processes: Depending on the system and the OS, quite a lot of system-level activity can be going on in the background after you turn it on. As much as possible, we like to wait for a stable baseline (idle) of system activity before kicking off a test. If we start testing immediately after booting the system, we often see higher variability in the first run before the scores start to tighten up.
Disclosure is not just about hardware: Most people know that different browsers will produce different performance scores on the same system. However, testers aren’t always aware of shifts in performance between different versions of the same browser. While most updates don’t have a large impact on performance, a few updates have increased (or even decreased) browser performance by a significant amount. For this reason, it’s always worthwhile to record and disclose the extended browser version number for each test run. The same principle applies to any other relevant software.
Use more than one data point: Because of natural variability, our standard practice in the XPRT lab is to publish a score that represents the median from at least three to five runs. If you run a benchmark only once, and the score differs significantly from other published scores, your result could be an outlier that you would not see again under stable testing conditions.

We hope those tips will make testing a little easier for you. If you have any questions about the XPRTs, or about benchmarking in general, feel free to ask!

Justin

Posted in Benchmark metrics, Benchmarking, Browser-based benchmarks, Performance benchmarking, Performance testing on tablets, Uncategorized, WebXPRT, WebXPRT 2013, WebXPRT 2015, XPRT Weekly Tech Spotlight |

Getting to know TouchXPRT

By Justin Greene

on May 4, 2017

Many of our community members first encountered the XPRTs when reading about WebXPRT or MobileXPRT in a device review, using TouchXPRT or HDXPRT in an OEM lab, or using BatteryXPRT or CrXPRT to evaluate devices for bulk purchasing on behalf of a corporation or government agency. They know that specific XPRT provided great value in that context, but may not know about the other members of the XPRT family.

To help keep folks up to date on the full extent of XPRT capabilities, we like to occasionally “reintroduce” each of the XPRTs. This week, we invite you to get to know TouchXPRT.

We developed TouchXPRT 2016 as a Universal Windows Platform app for Windows 10. We wanted to offer a free tool that would provide consumers with objective information about how well a Windows 10 or Windows 10 Mobile laptop, tablet, or phone handles common media tasks. To do this, TouchXPRT runs five tests that simulate the kinds of photo, video, and music editing tasks people do every day. It measures how quickly the device completes each of those tasks and provides an overall score. To compare device scores, go to TouchXPRT.com and click View Results, where you’ll find scores from many different Windows 10 and Windows 10 Mobile devices.

TouchXPRT is easy to install and run, and is a great resource for anyone who wants to evaluate the performance of a Windows 10 device.

If you’d like to run TouchXPRT:

Simply download TouchXPRT from the Microsoft Store. (If that doesn’t work for you, you can also download it directly from TouchXPRT.com.) Installing it should take about 15 minutes, and the TouchXPRT 2016 release notes provide step-by-step instructions.

If you’d like to dig into the details:

Check out the Exploring TouchXPRT 2016 white paper. In it, we discuss the TouchXPRT development process, its component tests and workloads, and how it calculates individual workload and overall scores. We also provide instructions for automated testing.

BenchmarkXPRT Development Community members also have access to the TouchXPRT source code, so consider joining today. There’s no obligation and membership is free for members of any company or organization with an interest in benchmarks.

If you haven’t tried running TouchXPRT before, give it a shot and let us know what you think!

Justin

Posted in BenchmarkXPRT, BenchmarkXPRT development community, Mobile devices, Performance benchmarking, Performance testing on tablets, TouchXPRT, TouchXPRT 2016, TouchXPRT development process, TouchXPRT results |

A clarification from Brett Howse

By Eric Hale

on February 18, 2016

A couple of weeks ago, I described a conversation I had with Brett Howse of AnandTech. Brett was kind enough to send a clarification of some of his remarks, which he gave us permission to share with you.

“We are at a point in time where the technology that’s been called mobile since its inception is now at a point where it makes sense to compare it to the PC. However we struggle with the comparisons because the tools used to do the testing do not always perform the same workloads. This can be a major issue when a company uses a mobile workload, and a desktop workload, but then puts the resulting scores side by side, which can lead to misinformed conclusions. This is not only a CPU issue either, since on the graphics side we have OpenGL well established, along with DirectX, in the PC space, but our mobile workloads tend to rely on OpenGL ES, with less precision asked of the GPU, and GPUs designed around this. Getting two devices to run the same work is a major challenge, but one that has people asking what the results would be.”

I really appreciate Brett taking the time to respond. What are your thoughts in these issues? Please let us know!

Eric

Posted in AnandTech, Collaborative benchmark development, DirectX, Future of performance evaluation, Let us know your thoughts, Mobile devices, Open GL, Performance benchmarking, Performance of computing devices, Performance testing on tablets, What makes a good benchmark? |

Category: Performance testing on tablets

Notes from the lab: choosing a calibration system for MobileXPRT 3

More on the way for the XPRT Weekly Tech Spotlight

Here’s to 100 more!

Best practices

Getting to know TouchXPRT

If you’d like to run TouchXPRT:

If you’d like to dig into the details:

A clarification from Brett Howse

Check out the other XPRTs: