Category: What makes a good benchmark?

The WebXPRT 3 results calculation white paper is now available

on May 31, 2018

As we’ve discussed in prior blog posts, transparency is a core value of our open development community. A key part of being transparent is explaining how we design our benchmarks, why we make certain development decisions, and how the benchmarks actually work. This week, to help WebXPRT 3 testers understand how the benchmark calculates results, we published the WebXPRT 3 results calculation and confidence interval white paper.

The white paper explains what the WebXPRT 3 confidence interval is, how it differs from typical benchmark variability, and how the benchmark calculates the individual workload scenario and overall scores. The paper also provides an overview of the statistical techniques WebXPRT uses to translate raw times into scores.

To supplement the white paper’s overview of the results calculation process, we’ve also published a spreadsheet that shows the raw data from a sample test run and reproduces the calculations WebXPRT uses.

The paper and spreadsheet are both available on WebXPRT.com and on our XPRT white papers page. If you have any questions about the WebXPRT results calculation process, please let us know, and be sure to check out our other XPRT white papers.

Justin

Posted in Benchmark metrics, Benchmarking, BenchmarkXPRT, BenchmarkXPRT development community, Browser-based benchmarks, Performance benchmarking, Web-based testing, WebXPRT, WebXPRT 3, What makes a good benchmark? | Tagged benchmark, WebXPRT, WebXPRT 3, white paper |

The value of speed

By Bill Catchings

on May 3, 2018

I was reading an interesting article on how high-end smartphones like the iPhone X, Pixel 2 XL, and Galaxy S8 generate more money from in-game revenue than cheaper phones do.

One line stood out to me: “With smartphones becoming faster, larger and more capable of delivering an engaging gaming experience, these monetization key performance indicators (KPIs) have begun to increase significantly.”

It turns out the game companies totally agree with the rest of us that faster devices are better!

Regardless of who is seeking better performance—consumers or game companies—the obvious question is how you determine which models are fastest. Many folks rely on device vendors’ claims about how much faster the new model is. Unfortunately, the vendors’ claims don’t always specify on what they base the claims. Even when they do, it’s hard to know whether the numbers are accurate and applicable to how you use your device.

The key part of any answer is performance tools that are representative, dependable, and open.

Representative – Performance tools need to have realistic workloads that do things that you care about.
Dependable – Good performance tools run reliably and produce repeatable results, both of which require that significant work go into their development and testing.
Open – Performance tools that allow people to access the source code, and even contribute to it, keep things above the table and reassure you that you can rely on the results.

Our goal with the XPRTs is to provide performance tools that meet all these criteria. WebXPRT 3 and all our other XPRTs exist to help accurately reveal how devices perform. You can run them yourself or rely on the wealth of results that we and others have collected on a wide array of devices.

The best thing about good performance tools is that everyone, even vendors, can use them. I sincerely hope that you find the XPRTs helpful when you make your next technology purchase.

Bill

Posted in Benchmark metrics, Benchmarking, Benchmarking computing devices, Mobile devices, Performance benchmarking, Performance of computing devices, Phones, WebXPRT, WebXPRT 3, What makes a good benchmark? |

AIXPRT: We want your feedback!

By Justin Greene

on April 12, 2018

Today, we’re publishing the AIXPRT Request for Comments (RFC) document. The RFC explains the need for a new artificial intelligence (AI)/machine learning benchmark, shows how the BenchmarkXPRT Development Community plans to address that need, and provides preliminary design specifications for the benchmark.

We’re seeking feedback and suggestions from anyone interested in shaping the future of machine learning benchmarking, including those not currently part of the Development Community. Usually, only members of the BenchmarkXPRT Development Community have access to our RFCs and the opportunity to provide feedback. However, because we’re seeking input from non-members who have expertise in this field, we will be posting this RFC in the New events & happenings section of the main BenchmarkXPRT.com page and making it available at AIXPRT.com.

We welcome input on all aspects of the benchmark, including scope, workloads, metrics and scores, UI design, and reporting requirements. We will accept feedback through May 13, 2018, after which BenchmarkXPRT Development Community administrators will collect and evaluate the feedback and publish the final design specification.

Please share the RFC with anyone interested in machine learning benchmarking and please send us your feedback before May 13.

Justin

Posted in AI, Benchmark metrics, Benchmarking, BenchmarkXPRT, BenchmarkXPRT development community, Collaborative benchmark development, Future of performance evaluation, Machine learning, Performance benchmarking, What makes a good benchmark? |

Comparing open source and open development

By Justin Greene

on April 5, 2018

Why do we use open development when designing and building the XPRTs, and what’s the difference between our open development approach and traditional open-source methods? The terminology around these two models can be confusing, so we wanted to review some similarities and differences.

Why open development?

An open development approach helps encourage collaboration, innovation, and transparency. XPRT community members get involved in the development of each benchmark from the beginning:

They submit suggestions, questions, and concerns that inform the future design of the tools.
They view early proposals for new versions and contribute comments for the final design.
They suggest new workloads.
They have access to community previews (beta builds) of the tools.
They submit source code for inclusion in the benchmarks.
They examine existing source code.

A commitment to transparency

Because we’re committed to publishing reliable, unbiased benchmarks, we also want make the XPRT development process as transparent as possible. It’s not unusual for people to claim that any given benchmark contains hidden biases. To address this problem, we make our source code available to anyone who joins the community. This approach reduces the risk of unforeseen bias in our benchmarks.

Quality control

Unlike open-source models, open development allows us to control derivative works, which can be important in benchmarking. While open source encourages a constantly evolving product that may fork into substantially different versions, benchmarking requires a product that remains static to enable valid comparisons over time. By controlling derivative works, we can avoid the problem of unauthorized versions of the benchmarks being published as “XPRTs.”

In the future, we may use a traditional open-source model for specific XPRTs or other projects. If we do, we’ll share our reasoning with the community and ask for their thoughts about the best way to proceed. If you’re not a community member, but are interested in benchmark development, we encourage you to join today!

Justin

Posted in BenchmarkXPRT development community, Collaborative benchmark development, Open development, Open Source, What makes a good benchmark? |

Just before showtime

By Justin Greene

on March 1, 2018

In case you missed the announcement, WebXPRT 3 is now live! Please try it out, submit your test results, and feel free to send us your questions or comments.

During the final push toward launch day, it occurred to us that not all of our readers are aware of the steps involved in preparing a benchmark for general availability (GA). Here’s a quick overview of what we did over the last several weeks to prepare for the WebXPRT 3 release, a process that follows the same approach we use for all new XPRTs.

After releasing the community preview (CP), we started on the final build. During this time, we incorporated features that we were not able to include in the CP and fixed a few outstanding issues. Because we always try to make sure that CP results are comparable to eventual GA results, these issues rarely involve the workloads themselves or anything that affects scoring. In the case of WebXPRT 3, the end-of-test results submission form was not fully functional in the CP, so we finished making it ready for prime time.

The period between CP and GA releases is also a time to incorporate any feedback we get from the community during initial testing. One of the benefits of membership in the BenchmarkXPRT Development Community is access to pre-release versions of new benchmarks, along with an opportunity to make your voice heard during the development process.

When the GA candidate build is ready, we begin two types of extensive testing. First, our quality assurance (QA) team performs a thorough review, running the build on numerous devices. In the case of WebXPRT, it also involves testing with multiple browsers. The QA team also keeps a sharp eye out for formatting problems and bugs.

The second type of testing involves comparing the current version of the benchmark with prior versions. We tested WebXPRT 3 on almost 100 devices. While WebXPRT 2015 and WebXPRT 3 scores are not directly comparable, we normalize scores for both sets of results and check that device performance is scaling in the same way. If it isn’t, we need to determine why not.

Finally, after testing is complete and the new build is ready, we finalize all related documentation and tie the various pieces together on the web site. This involves updating the main benchmark page and graphics, the FAQ page, the results tables, and the members’ area.

That’s just a brief summary of what we’ve been up to with WebXPRT in the last few weeks. If you have any questions about the XPRTs or the development community, feel free to ask!

Justin

Posted in Benchmarking, BenchmarkXPRT development community, Browser-based benchmarks, Collaborative benchmark development, Community Preview, Cross-platform benchmarks, WebXPRT, What makes a good benchmark? |

Nothing to hide

By Eric Hale

on November 9, 2017

I recently saw an article in ZDNet by my old friend Steven J. Vaughan-Nichols that talks about how NetMarketShare and StatCounter reported a significant jump in the operating system market shares for Linux and Chrome OS. One frustration Vaughan-Nichols alluded to in the article is the lack of transparency into how these firms calculated market share, so he can’t gauge how reliable they are. Because neither NetMarketShare nor StatCounter disclosed their methods, there’s no sure way for interested observers to verify the numbers. Steven prefers the data from the federal government’s Digital Analytics Program (DAP). DAP makes its data freely available, so you can run your own calculations. Transparency generates trust.

Transparency is a core value for the XPRTs. We’ve written before about how statistics can be misleading. That’s why we’ve always disclosed exactly how the XPRTs calculate performance results, and the way BatteryXPRT calculates battery life. It’s also why we make each XPRT’s source code available to community members. We want to be open and honest about how we do things, and our open development community model fosters the kind of constructive feedback that helps to continually improve the XPRTs.

We’d love for you to be a part of that process, so if you have questions or suggestions for improvement, let us know. If you’d like to gain access to XPRT source code and previews of upcoming benchmarks, today is a great day to join the community!

Eric

Posted in Battery life, BatteryXPRT 2014 for Android, BenchmarkXPRT development community, Chrome OS, Collaborative benchmark development, What makes a good benchmark? |

Category: What makes a good benchmark?

The WebXPRT 3 results calculation white paper is now available

The value of speed

AIXPRT: We want your feedback!

Comparing open source and open development

Why open development?

A commitment to transparency

Quality control

Just before showtime

Nothing to hide

Check out the other XPRTs: