BenchmarkXPRT Blog banner

Author Archives: Eric Hale

Nothing to hide

I recently saw an article in ZDNet by my old friend Steven J. Vaughan-Nichols that talks about how NetMarketShare and StatCounter reported a significant jump in the operating system market shares for Linux and Chrome OS. One frustration Vaughan-Nichols alluded to in the article is the lack of transparency into how these firms calculated market share, so he can’t gauge how reliable they are. Because neither NetMarketShare nor StatCounter disclosed their methods, there’s no sure way for interested observers to verify the numbers. Steven prefers the data from the federal government’s Digital Analytics Program (DAP). DAP makes its data freely available, so you can run your own calculations. Transparency generates trust.

Transparency is a core value for the XPRTs. We’ve written before about how statistics can be misleading. That’s why we’ve always disclosed exactly how the XPRTs calculate performance results, and the way BatteryXPRT calculates battery life. It’s also why we make each XPRT’s source code available to community members. We want to be open and honest about how we do things, and our open development community model fosters the kind of constructive feedback that helps to continually improve the XPRTs.

We’d love for you to be a part of that process, so if you have questions or suggestions for improvement, let us know. If you’d like to gain access to XPRT source code and previews of upcoming benchmarks, today is a great day to join the community!


Looking for performance clues

We’ve written before about how the operating system and other software can influence test scores and even battery life. Benchmarks like the XPRTs provide overall results, but teasing out which factors affect those results may require some detective work. The key is to collect individual data points as evidence to what may be causing performance changes.

The Apple iOS 11 rollout last month is an excellent example of the effect of software on device performance. Angry tweets started almost immediately after the update, claiming that iOS 11 drained device batteries. iPhone users here at PT experienced similar issues. What was the cause of that performance drop? The hardware remained the same. So, did software cause the problem?

Less than a week after the rollout, Mashable published an explanation of possible causes. The article quotes research from mobile security firm Wandera showing that, for the 50,000 “moderate to heavy iPhone and iPad users” in the study, devices running iOS 11 burned through their battery at much faster rates than the same devices running iOS 10. They cite two possible causes:

    • devices often re-categorize the files stored on them for every new OS install, which may account for some of the battery issues.
    • many apps are not optimized for iOS 11 yet.


While these explanations make sense, with a little more digging, we could get closer to actually solving the mystery instead of guessing at the causes. After all, it is also possible that people are using iOS 11 differently from iOS 10. So, how could a dedicated sleuth investigate further? Anyone using benchmarks and hands-on testing to sift through various scenarios and configurations could get us closer to solving this mystery and any other software-based performance anomalies. But it’s a daunting task—changing only one variable at a time and recording the results is like pounding the streets and knocking on doors to solve a case.

In all likelihood, some combination of Apple iOS updates and application changes will improve the battery life for iOS 11. In the meantime, we wish we had an XPRT that could test battery life on iOS. Who knows, maybe some future version of WebXPRT will be able to help in future sleuthing.


Everything old is new again

I recently saw an article called “4 lessons for modern software developers from 1970s mainframe programming.” This caught my eye because I started programming in the late 1970s, and my first programming environment was an IBM 370.

The author talks about how, back in the old days, you had to write tight code because memory and computing resources were limited. He also talks about the great amount of time we spent planning, writing, proofreading, and revising our code—all on paper—before running it. We did that because computing resources were expensive and you would get in trouble for using too many. He’s right about that—I got reamed out a couple of times!

At first, it seemed like this was just another article by an old programmer talking about how sloppy and lazy the new generation is, but then he made an interesting point. Programming for embedded processors reintroduces the types of resource limitations we used to have to deal with. Cloud computing reintroduces having to pay for computing resources based on usage.

I personally think he goes too far in making his point – there are a lot times when rapid prototyping and iterative development are the best way to do things. However, his main thesis has merit. Some new applications may benefit from doing things the old way.

Cloud computing and embedded processors are, of course, important in machine learning applications. As we’re working on a machine learning XPRT, we’ll be following best practices for this new environment!


Apples and pears vs. oranges and bananas

When people talk about comparing disparate things, they often say that you’re comparing apples and oranges. However, sometimes that expression doesn’t begin to describe the situation.

Recently, Justin wrote about using CrXPRT on systems running Neverware CloudReady OS. In that post, he noted that we couldn’t guarantee that using CrXPRT on CloudReady and Chrome OS systems would be a fair comparison. Not surprisingly, that prompted the question “Why not?”

Here’s the thing: It’s a fair comparison of those software stacks running on those hardware configurations. If everyone accepted that and stopped there, all would be good. However, almost inevitably, people will read more into the scores than is appropriate.

In such a comparison, we’re changing multiple variables at once. We’ve written before about the effect of the software stack on performance. CloudReady and Chrome OS are two different implementations of the Chromium OS, and it’s possible that one is more efficient than the other. If so, that would affect CrXPRT scores. At the same time, the raw performance of the two hardware configurations under test could also differ to a certain degree, which would also affect CrXPRT scores.

Here’s a metaphor: If you measure the effective force at the end of two levers and find a difference, to what do you attribute that difference? If you know the levers are the same length, you can attribute the difference to the amount of applied force. If you know the applied force is identical, you can attribute the difference to the length of the levers. If you lack both of those data points, you can’t know whether the difference is due to the length, the force, or a combination of the two.

With a benchmark, you can run multiple experiments designed to isolate variables and use the results from those experiments to look for trends. For example, we could install both CloudReady OS and Chrome OS on the same Intel-based Chromebook and compare the CrXPRT results. Because that removes hardware differences as a variable, such an experiment would offer some insight into how the two implementations compare. However, because differences in hardware can affect the performance of a given piece of software, this single data point would be of limited value. We could repeat the experiment on a variety of other Intel-based Chromebooks, and other patterns might emerge. If one of the implementations consistently scored higher, that would suggest that it was more efficient than the other, but would still not be definitively conclusive.

I hope this gives you some idea about why we are cautious about drawing conclusions when comparing results from different sets of hardware running different software stacks.


Another pronunciation lesson

Knowing how to say the terms we read on-line can be a bit of a mystery. For example, it’s been 30 years since CompuServe created the GIF format, and people are still arguing about how to say it.

A couple of months ago, we talked about how to pronounce WebXPRT. In the video we pointed to, the narrator openly says he’s confused about how to say “XPRT.” For the record, it’s pronounced “expert.”

Recently, we came across another video, which referred to CrXPRT. The narrator pronounced it “Chrome expert.” The “expert” part is correct, but the “Chrome” part is not. It’s an understandable mistake, because Cr is the chemical symbol for chromium. That’s why we chose it! However, we pronounce the C and R individually. So, the name is said “C R expert.”

All that being said, it was great to see CrXPRT in the classroom! When we created CrXPRT, the education market was a big consideration, as you can tell from this CrXPRT video. We love seeing the XPRTs in the real world!


Evolve or die

Last week, Google announced that it would retire its Octane benchmark. Their announcement explains that they designed Octane to spur improvement in JavaScript performance, and while it did just that when it was first released, those improvements have plateaued in recent years. They also note that there are some operations in Octane that optimize Octane scores but do not reflect real-world scenarios. That’s unfortunate, because they, like most of us, want improvements in benchmark scores to mean improvements in end-user experience.

WebXPRT comes at the web performance issue differently. While Octane’s goal was to improve JavaScript performance, the purpose of WebXPRT is to measure performance from the end user’s perspective. By doing the types of work real people do, WebXPRT doesn’t measure only improvements in JavaScript performance; it also measures the quality of the real-world user experience. WebXPRT’s results also reflect the performance of the entire device and software stack, not just the performance of the JavaScript interpreter.

Google’s announcement reminds us that benchmarks have finite life spans, that they must constantly evolve to keep pace with changes in technology, or they will become useless. To make sure the XPRT benchmarks do just that, we are always looking at how people use their devices and developing workloads that reflect their actions. This is a core element of the XPRT philosophy.

As we mentioned last week, we’ve working on the next version of WebXPRT. If you have any thoughts about how it should evolve, let us know!


Check out the other XPRTs:

Forgot your password?