BenchmarkXPRT Blog banner

Category: What makes a good benchmark?

How we evaluate new WebXPRT workload proposals

A key value of the BenchmarkXPRT Development Community is our openness to user feedback. Whether it’s positive feedback about our benchmarks, constructive criticism, ideas for completely new benchmarks, or proposed workload scenarios for existing benchmarks, we appreciate your input and give it serious consideration.

We’re currently accepting ideas and suggestions for ways we can improve WebXPRT 4. We are open to adding both non-workload features and new auxiliary tests, which can be experimental or targeted workloads that run separately from the main test and produce their own scores. You can read more about experimental WebXPRT 4 workloads here. However, a recent user question about possible WebGPU workloads has prompted us to explain the types of parameters that we consider when we evaluate a new WebXPRT workload proposal.

Community interest and real-life relevance

The first two parameters we use when evaluating a WebXPRT workload proposal are straightforward: are people interested in the workload and is it relevant to real life? We originally developed WebXPRT to evaluate device performance using the types of web-based tasks that people are likely to encounter daily, and real-life relevancy continues to be an important criterion for us during development. There are many technologies, functions, and use cases that we could test in a web environment, but only some of them are both relevant to common applications or usage patterns and likely to be interesting to lab testers and tech reviewers.

Maximum cross-platform support

Currently, WebXPRT runs in almost any web browser, on almost any device that has a web browser, and we would ideally maintain that broad level of cross-platform support when introducing new workloads. However, technical differences in the ways that different browsers execute tasks mean that some types of scenarios would be impossible to include without breaking our cross-platform commitment.

One reason that we’re considering auxiliary workloads with WebXPRT, e.g., a battery life rundown, is that those workloads would allow WebXPRT to offer additional value to users while maintaining the cross-platform nature of the main test. Even if a battery life test ran on only one major browser, it could still be very useful to many people.

Performance differentiation

Computer benchmarks such as the XPRTs exist to provide users with reliable metrics that they can use to gauge how well target platforms or technologies perform certain tasks. With a broadly targeted benchmark such as WebXPRT, if the workloads are so heavy that most devices can’t handle them, or so light that most devices complete them without being taxed, the results will have little to no use for OEM labs, the tech press, or independent users when evaluating devices or making purchasing decisions.

Consequently, with any new WebXPRT workload, we try to find a sweet spot in terms of how demanding it is. We want it to run on a wide range of devices—from low-end devices that are several years old to brand-new high-end devices and everything in between. We also want users to see a wide range of workload scores and resulting overall scores, so they can easily grasp the different performance capabilities of the devices under test.

Consistency and replicability

Finally, workloads should produce scores that consistently fall within an acceptable margin of error, and are easily to replicate with additional testing or comparable gear. Some web technologies are very sensitive to uncontrollable or unpredictable variables, such as internet speed. A workload that measures one of those technologies would be unlikely to produce results that are consistent and easily replicated.

We hope this post will be useful for folks who are contemplating potential new WebXPRT workloads. If you have any general thoughts about browser performance testing, or specific workload ideas that you’d like us to consider, please let us know.

Justin

We want your thoughts about experimental WebXPRT 4 workloads

Two weeks ago, we discussed how users can automate WebXPRT 4 testing by appending several parameters and values to the benchmark’s URL. One of these lets you enable any available experimental workloads during the test run. While we don’t currently offer any experimental workloads for WebXPRT 4, we are seeking suggestions for possible future workload scenarios, or specific web technologies that you’d like to be able to test with an experimental workload.

The main purpose of optional, experimental workloads would be to test cutting-edge browser technologies or new use cases, even if the experimental workload doesn’t work on all browsers or devices. The individual scores for the experimental workloads would stand alone, and would not factor in the WebXPRT 4 overall score. WebXPRT 4 testers would be able to run the experimental workloads one of two ways: by adjusting a value in the WebXPRT 4 automation scripts, as mentioned above, or by manually selecting them on the benchmark’s home screen.

Testers would benefit from experimental workloads by learning how well certain browsers or systems handle new tasks (e.g., new web apps or AI capabilities). We would benefit from fielding workloads for large-scale testing and user feedback before we commit to including them as core WebXPRT workloads.

Do you have any general thoughts about experimental workloads for browser performance testing, or any specific workloads that you’d like us to consider? Please let us know.

Justin

The XPRTs: What would you like to see?

One of the core principles of the BenchmarkXPRT Development Community is a commitment to valuing the feedback of both community members and the larger group of testers that use the XPRTs on a regular basis. That feedback helps us to ensure that as the XPRTs continue to grow and evolve, the resources that we offer will continue to meet the needs of those that use them.

In the past, user feedback has influenced specific aspects of our benchmarks such as the length of test runs, user interface features, results presentation, and the removal or inclusion of specific workloads. More broadly, we have also received suggestions for entirely new XPRTs and ways we might target emerging technologies or industry use cases.

As we approach the second half of 2022 and begin planning for 2023, we’re asking to hear your ideas about new XPRTs—or new features for existing XPRTs. Are you aware of hardware form factors, software platforms, or prominent applications that are difficult or impossible to evaluate using existing performance benchmarks? Are there new technologies we should be incorporating into existing XPRTs via new workloads? Can you recommend ways to improve any of the XPRTs or XPRT-related tools such as results viewers?

We are interested in your answers to these questions and any other ideas you have, so please feel free to contact us. We look forward to hearing your thoughts!

Justin

Chrome OS support for CrXPRT apps ends in June 2022

Last March, we discussed the Chrome OS team’s original announcement that they would be phasing out support for Chrome Apps altogether in June 2021, and would shift their focus to Chrome extensions and Progressive Web Apps. The Chrome OS team eventually extended support for existing Chrome Apps through June 2022, but as of this week, we see no indication that they will further extend support for Chrome Apps published with general developer accounts. If the end-of-life schedule for Chrome Apps does not change in the next few months, both CrXPRT 2 and CrXPRT 2015 will stop working on new versions of Chrome OS at some point in June.

To maintain CrXPRT functionality past June, we would need to rebuild the app completely—either as a Progressive Web App or in some other form. For this reason, we want to reassess our approach to Chrome OS testing, and investigate which features and technologies to include in a new Chrome OS benchmark. Our current goal is to gather feedback and conduct exploratory research over the next few months, and begin developing an all-new Chrome OS benchmark for publication by the end of the year.

While we will discuss ideas for this new Chrome OS benchmark in future blog posts, we welcome ideas from CrXPRT users now. What features or workloads would you like the new benchmark to retain? Would you like us to remove any components from the existing benchmark? Does the battery life test in its current form suit your needs? If you have any thoughts about these questions or any other aspects of Chrome OS benchmarking, please let us know!

Justin

Why we don’t control screen brightness during CrXPRT 2 battery life tests

Recently, we had a discussion with a community member about why we no longer recommend specific screen brightness settings during CrXPRT 2 battery life tests. In the CrXPRT 2015 user manual, we recommended setting the test system’s screen brightness to 200 nits. Because the amount of power that a system directs to screen brightness can have a significant impact on battery life, we believed that pegging screen brightness to a common standard for all test systems would yield apple-to-apples comparisons.

After extensive experience with CrXPRT 2015 testing, we decided to not recommend a standard screen brightness with CrXPRT 2, for the following reasons:

  • A significant number of Chromebooks cannot produce a screen brightness of 200 nits. A few higher-end models can do so, but they are not representative of most Chromebooks. Some Chromebooks, especially those that many school districts and corporations purchase in bulk, cannot produce a brightness of even 100 nits.
  • Because of the point above, adjusting screen brightness would not represent real-life conditions for most Chromebooks, and the battery life results could mislead consumers who want to know the battery life they can expect with default out-of-box settings.
  • Most testers, and even some labs, do not have light meters, and the simple brightness percentages that the operating system reports produce different degrees of brightness on different systems. For testers without light meters, a standardized screen brightness recommendation could discourage them from running the test.
  • The brightness controls for some low-end Chromebooks lack the fine-tuning capability that is necessary to standardize brightness between systems. In those cases, an increase or decrease of one notch can swing brightness by 20 to 30 nits in either direction. This could also discourage testing by leading people to believe that they lack the capability to correctly run the test.

In situations where testers want to compare battery life using standardized screen brightness, we recommend using light meters to set the brightness levels as closely as possible. If the brightness levels between systems vary by more than few nits, and if the levels vary significantly from out-of-box settings, the publication of any resulting battery life results should include a full disclosure and explanation of test conditions.

For the majority of testers without light meters, running the CrXPRT 2 battery life test with default screen brightness settings on each system provides a reliable and accurate estimate of the type of real-world, out-of-box battery life consumers can expect.

If you have any questions or comments about the CrXPRT 2 battery life test, please feel free to contact us!

Justin

Thinking about experimental WebXPRT workloads in 2022

As the WebXPRT 4 development process has progressed, we’ve started to discuss the possibility of offering experimental WebXPRT 4 workloads in 2022. These would be optional workloads that test cutting-edge browser technologies or new use cases. The individual scores for the experimental workloads would stand alone, and would not factor in the WebXPRT 4 overall score.

WebXPRT testers would be able to run the experimental workloads one of two ways: by manually selecting them on the benchmark’s home screen, or by adjusting a value in the WebXPRT 4 automation scripts.

Testers would benefit from experimental workloads by being able to compare how well certain browsers or systems handle new tasks (e.g., new web apps or AI capabilities). We would benefit from fielding workloads for large-scale testing and user feedback before we commit to including them as core WebXPRT workloads.

Do you have any general thoughts about experimental workloads for browser performance testing, or any specific workloads that you’d like us to consider? Please let us know.

Justin

Check out the other XPRTs:

Forgot your password?