Category: Benchmark metrics

Seeing the whole picture

on August 21, 2014

In past posts, we’ve discussed how people tend to focus on hardware differences when comparing performance or battery life scores between systems, but software factors such as OS version, choice of browser, and background activity often influence benchmark results on multiple levels.

For example, AnandTech recently published an article explaining how a decision by Google Chrome developers to increase Web page rendering times may have introduced a tradeoff between performance and battery life. To increase performance, Chrome asks Windows to use 1ms interrupt timings instead of the default 15.6ms timing. Unlike other applications that wait for the default timing, Chrome ends up getting its work done more often.

The tradeoff for that increased performance is that waking up the OS more frequently can diminish the effectiveness of a system’s innate power-saving attributes, such as a tick-less kernel and timer coalescing in Windows 8, or efficiency innovations in a new chip architecture. In this case, because of the OS-level interactions between Chrome and Windows, a faster browser could end up having a greater impact on battery life than might initially be suspected.

The article discusses the limitations of their test in detail, specifically with regards to Chrome 36 not being able to natively support the same HiDPI resolution as the other browsers, but the point we’re drawing out here is that accurate testing involves taking all relevant factors into consideration. People are used to the idea that changing browsers may impact Web performance, but not so much is said about a browser’s impact on battery life.

Justin

Comment on this post in the forums

Posted in Battery life, Benchmark metrics, Google Chrome, Performance benchmarking, Windows 8 |

It’s all in the presentation

By Eric Hale

on April 24, 2014

The comment period for BatteryXPRT CP2 ended on Monday. Now we are in the final sprint to release the benchmark.

The extensive testing we’ve been doing has meant that we’ve been staring at a lot of numbers. This has led us to make a change in how we present the results. As you would expect, the battery life when you’re running the test using Wi-Fi is different than when you’re running it using your cellular network. Although individual devices vary, the difference is in the vicinity of 10 percent, about the same as the difference between Airplane mode and using Wi-Fi.

BatteryXPRT has always captured a device’s Wi-Fi setting in its disclosure results, but had not included this information with the results. Because we found it so helpful to have the Wi-Fi setting alongside the results, we have changed the presentation of the results to recognize three modes: Airplane, Wi-Fi, and Cellular. We hope that this will avoid confusion as people are using BatteryXPRT.

Note that we have not changed the way the results are calculated. Results you generated during the preview are still valid. However, results from one mode should not be compared to results from another mode.

We’ve been talking a lot about BatteryXPRT, but TouchXPRT is also looking great! We’re looking forward to releasing both of them soon!

Eric

Comment on this post in the forums

Posted in Android, Battery life, BatteryXPRT 2014 for Android, Benchmark metrics, TouchXPRT 2014 |

Staying out in the open

By Eric Hale

on October 3, 2013

Back in July, Anandtech publicized some research about possible benchmark optimizations in the Galaxy S4. Yesterday, Anandtech published a much more comprehensive article, “The State of Cheating in Android Benchmarks.” It’s well worth the read.

Anandtech doesn’t accuse any of the benchmarks of being biased—it’s the OEMS who are supposedly doing the optimizations. I will note that none of the XPRT benchmarks are among the whitelisted CPU tests. That being said, I imagine that everyone in the benchmark game is concerned about any implication that their benchmark could be biased.

When I was a kid, my parents taught me that it’s a lot harder to cheat in the open. This is one of the reasons we believe so strongly in the community model for software development. The source code is available to anyone who joins the community. It’s impossible to hide any biases. At the same time, it allows us to control derivative works. That’s necessary to avoid biased versions of the benchmarks being published. We think the community model strikes the right balance.

However, any time there is a system, someone will try to game it. We’ll always be on the lookout for optimizations that happen outside the benchmarks.

Eric

Comment on this post in the forums

Posted in Android, Benchmark metrics, Benchmarks in general, BenchmarkXPRT development community, Collaborative benchmark development, MobileXPRT |

Lies, damned lies, and statistics

By Eric Hale

on April 11, 2013

No one knows who first said “lies, damned lies, and statistics,” but it’s easy to understand why they said it. It’s no surprise that the bestselling statistics book in history is titled How to Lie with Statistics. While the title is facetious, it is certainly true that statistics can be confusing—consider the word “average,” which can refer to the mean, median, or mode. “Mean average,” in turn, can refer to the arithmetic mean, the geometric mean, or the harmonic mean. It’s enough to make a non-statistician’s head spin.

In fact, a number of people have been confused by the confidence interval WebXPRT reports. We believe that the best way to stand behind your results is to be completely open about how you crunch the numbers. To this end, we released the white paper WebXPRT 2013 results calculation and confidence interval this past Monday.

This white paper, which does not require a background in mathematics, explains what the WebXPRT confidence interval is and how it differs from the benchmark variability we sometimes talk about. The paper also gives an overview of the statistical and mathematical techniques WebXPRT uses to translate the raw timing numbers into results.

Because sometimes the devil is in the details, we wanted to augment our overview by showing exactly how WebXPRT calculates results. The white paper is accompanied by a spreadsheet that reproduces the calculations WebXPRT uses. If you are mathematically inclined and would like to suggest improvements to the process, by all means let us know!

Eric

Comment on this post in the forums

Posted in Benchmark metrics, WebXPRT, WebXPRT 2013 results, White papers |

Keep them coming!

By Eric Hale

on January 31, 2013

Questions and comments have continued to come in since the Webinar last week. Here are a few of them:

How long are results valid? For a reviewer like us, we need to know that we can reuse results for a reasonable length of time. There is a tension between keeping results stable and keeping the benchmark current enough for the results to be relevant. Historically, HDXPRT allowed at least a year between releases. Based on the feedback we’ve received, a year seems like a reasonable length of time.
Is HDXPRT command line operable? (asked by a community member with a scripted suite of tests) HDXPRT 2012 is not, but we will consider adding a command line interface for HDXPRT 2013. While most casual users don’t need a command line interface, it could be very valuable to those of us using HDXPRT in labs.
I would be hesitant to overemphasize the running time of HDXPRT. The more applications it runs, the more it can differentiate things and the more interesting it is to those of us who run it at a professional level. If I could say “This gives a complete overview of the performance of this system,” that would actually save time. This comment was a surprise, given the amount of feedback we received saying that HDXPRT was too large. However, this gets to the heart of why we all need to be careful as we consider which applications to include in HDXPRT 2013.

If you had to miss the Webinar, it’s available at the BenchmarkXPRT 2013 Webinars page.

We’re planning to release the HDXPRT 2013 RFC next week. We’re looking forward to your comments.

Eric

Comment on this post in the forums

Posted in Benchmark metrics, Benchmarking, HDXPRT, HDXPRT workloads |

TouchXPRT in the fast lane

By Bill Catchings

on October 25, 2012

I titled last week’s blog “Putting the TouchXPRT pedal to the metal.” The metaphor still applies. On Monday, we released TouchXPRT 2013 Community Preview 1 (CP1). Members can download it here.

CP1 contains five scenarios based on our research and community feedback. The scenarios are Beautify Photo Album, Prepare Photos for Sharing, Convert Videos for Sharing, Export Podcast to MP3, and Create Slideshow from Photos.

Each scenario gives two types of results. There’s a rate, which allows for simple “bigger is better” comparisons. CP1 also gives the elapsed time for each scenario, which is easier to grasp intuitively. Each approach has its advantages. We’d like to get your feedback on whether you’d like us to pick one of those metrics for the final version of TouchXPRT 2013 or whether it makes more sense to include both. You’ll find a fuller description of the scenarios and the results in the TouchXPRT 2013 Community Preview 1 Design overview.

While you’re looking at CP1, we’re getting the source ready to release. To check out the source, you’ll need a system running Windows 8, with Visual Studio 2012 installed. We hope to release it on Friday. Keep your eye the TouchXPRT forums for more details.

Post your feedback to the TouchXPRT forum, or e-mail it to TouchXPRTSupport@principledtechnologies.com. Do you want more scenarios? Different metrics? A new UI feature? Let us know! Make TouchXPRT the benchmark you want it to be.

As I explained last week, we released CP1 without any restrictions on publishing results. It seems that AnandTech was the first to take advantage of that. Read AnandTech’s Microsoft Surface Review to see TouchXPRT in action.

We are hoping that other folks take advantage of CP1’s capability to act as a cross-platform benchmark on the new class of Windows 8 devices. Come join us in the fast lane!

Bill

Comment on this post in the forums

Posted in Benchmark metrics, Collaborative benchmark development, Touch-based benchmarking, Touch-based devices, TouchXPRT development process, TouchXPRT release cycle, TouchXPRT results |

Category: Benchmark metrics

Seeing the whole picture

It’s all in the presentation

Staying out in the open

Lies, damned lies, and statistics

Keep them coming!

TouchXPRT in the fast lane

Check out the other XPRTs: