BenchmarkXPRT Blog banner

Tag Archives: BenchmarkXPRT Development Community

Accessing the WebXPRT 4 source code

If you’re new to the XPRTs, you may not be aware that we provide free access to XPRT benchmark source code. Publishing XPRT source code is part of our commitment to making the XPRT development process as transparent as possible. By allowing interested parties to access and review our source code, we’re encouraging openness and honesty in the benchmarking industry. We’re also inviting constructive feedback that can help ensure that the XPRTs continue to improve and contribute to a level playing field for all the types of products they measure.

While we do offer free access to the XPRT source code, we’ve decided to offer the code upon request instead of using a permanent download link. This approach prevents bots or other malicious actors from downloading the code. It also has the benefit of allowing us to interact with users who are interested in the source code and answer any questions they may have. We’re always keen to learn more about what others are thinking about the XPRTs and the types of work they measure.

We recently received some questions about accessing the WebXPRT 4 source code, which made us realize that we needed to make a clearer way for people to ask for the code. In response, we added a “Request WebXPRT 4 source code” link to the gray Helpful Info box on WebXPRT.com (see it in the screenshot below). Clicking the link will allow you to email the BenchmarkXPRT Support team directly and request the code.

After we receive your request, we’ll send you a secure link to the current WebXPRT 4 build package. For those users who wish to set up a local instance of WebXPRT 4 for their own internal testbeds, the package will contain all the necessary files and installation instructions. We allow folks to set up their own instances for purposes of review, internal testing, or experimentation, but we ask that users publish only test results from the official WebXPRT 4 site.

While we offer free access to XPRT source code, our approach to derivative works differs from some traditional open-source models that encourage developers to change products and even take them in different directions. Because benchmarking requires a product that remains static to enable valid comparisons over time, we allow people to download the source, but we reserve the right to control derivative works. This discourages a situation where someone publishes an unauthorized version of the benchmark and calls it an “XPRT.”

If you have any questions about accessing the WebXPRT 4 source code, let us know!

Justin

WebXPRT benchmarking tips from the XPRT lab

Occasionally, we receive inquiries from XPRT users asking for help determining why two systems with the same hardware configuration are producing significantly different WebXPRT scores. This can happen for many reasons, including different software stacks, but score variability can also result from different testing behaviors and environments. While some degree of variability is normal, these types of questions provide us with an opportunity to talk about some of the basic benchmarking practices we follow in the XPRT lab to produce the most consistent and reliable scores.

Below, we list a few basic best practices you might find useful in your testing. Most of them relate to evaluating browser performance with WebXPRT, but several of these practices apply to other benchmarks as well.

  • Hardware is not the only important factor: Most people know that different browsers produce different performance scores on the same system. Testers are not, however, always aware of shifts in performance between different versions of the same browser. While most updates don’t have a large impact on performance, a few updates have increased (or even decreased) browser performance by a significant amount. For this reason, it’s always important to record and disclose the extended browser version number for each test run. The same principle applies to any other relevant software.
  • Keep a thorough record of system information: We record detailed information about a test system’s key hardware and software components, including full model and version numbers. This information is not only important for later disclosure if we choose to publish a result, it can also sometimes help to pinpoint system differences that explain why two seemingly identical devices are producing very different scores. We also want people to be able to reproduce our results to the closest extent possible, so that commitment involves recording and disclosing more detail than you’ll find in some tech articles and product reviews.
  • Test with clean images: We typically use an out-of-box (OOB) method for testing new devices in the XPRT lab. OOB testing means that other than running the initial OS and browser version updates that users are likely to run after first turning on the device, we change as little as possible before testing. We want to assess the performance that buyers are likely to see when they first purchase the device and before they install additional software. This is the best way to provide an accurate assessment of the performance retail buyers will experience from their new devices. That said, the OOB method is not appropriate for certain types of testing, such as when you want to compare as close to identical system images as possible, or when you want to remove as much pre-loaded software as possible.
  • Turn off automatic updates: We do our best to eliminate or minimize app and system updates after initial setup. Some vendors are making it more difficult to turn off updates completely, but you should always double-check update settings before testing.
  • Get a baseline for system processes: Depending on the system and the OS, a significant amount of system-level activity can be going on in the background after you turn it on. As much as possible, we like to wait for a stable baseline (idle time) of system activity before kicking off a test. If we start testing immediately after booting the system, we often see higher variance in the first run before the scores start to tighten up.
  • Use more than one data point: Because of natural variance, our standard practice in the XPRT lab is to publish a score that represents the median from three to five runs, if not more. If you run a benchmark only once and the score differs significantly from other published scores, your result could be an outlier that you would not see again under stable testing conditions or over the course of multiple runs.


We hope these tips will help make your testing more accurate. If you have any questions about WebXPRT, the other XPRTs, or benchmarking in general, feel free to ask!

Justin

A bit of house cleaning at BenchmarkXPRT.com

When we’ve released a new version of an XPRT benchmark app, it’s been our practice for many years to maintain a link to the previous version on the benchmark’s main page. For example, visitors can start on the WebXPRT 4 homepage at WebXPRT.com and follow links to access WebXPRT 3, WebXPRT 2015, and WebXPRT 2013. Historically, we’ve maintained these links because labs and tech reviewers usually take a while to introduce a new benchmark to their testing suite. Continued access to the older benchmarks also allows users to quickly compare new devices to old devices without retesting everything.

That being said, several of the XPRT pages currently contain links to benchmarks that we no longer actively support. Some of those benchmarks still function correctly, and testers occasionally use them, but a few no longer work on the latest versions of the operating systems or browsers that we designed them to test. While we want to continue to provide a way for longtime XPRT users to access legacy XPRTs,  we also want to avoid potential confusion for new users. We believe our best way forward is to archive older tests in a separate part of the site.

In the coming weeks, we’ll be moving several legacy XPRT benchmarks to an archive section of the site. Once the new section is ready, we’ll link to it from the Extras drop-down menu at the top of BenchmarkXPRT.com. The benchmarks will still be available in the archive, but we will not actively support them or directly link to them from the homepages of active XPRTs.

During this process, we’ll move the following benchmarks to the archive section:

  • WebXPRT 2015 and 2013
  • CrXPRT 2015
  • HDXPRT 2014
  • TouchXPRT 2014
  • MobileXPRT 2015 and 2013

If you have any questions or concerns about the archive process or access to legacy XPRTs, please let us know

Justin

Exploring the XPRT white paper library

As part of our commitment to publishing reliable, unbiased benchmarks, we strive to make the XPRT development process as transparent as possible. In the technology assessment industry, it’s not unusual for people to claim that any given benchmark contains hidden biases, so we take preemptive steps to address this issue by publishing XPRT benchmark source code, detailed system disclosures and test methodologies, and in-depth white papers. Today, we’re focusing on the XPRT white paper library.

The XPRT white paper library currently contains 21 white papers that we’ve published over the last 12 years. We started publishing white papers to provide XPRT users with more information about how we design our benchmarks, why we make certain development decisions, and how the benchmarks work. If you have questions about any aspect of one of the XPRT benchmarks, the white paper library is a great place to find some answers.

For example, the Exploring WebXPRT 4 white paper describes the design and structure of WebXPRT 4, including detailed information about the benchmark’s harness, HTML5 and WebAssembly (WASM) capability checks, and the structure of the performance test workloads. It also includes explanations of the benchmark’s scoring methodology, how to automate tests, and how to submit results for publication.

The companion WebXPRT 4 results calculation white paper explains the formulas that WebXPRT 4 uses to calculate the individual workload scenario scores and overall score, provides an overview of the statistical techniques WebXPRT uses to translate raw timings into scores, and explains the benchmark’s confidence interval and how it differs from typical benchmark variability. To supplement the white paper’s discussion of the results calculation process, we published a results calculation spreadsheet that shows the raw data from a sample test run and reproduces the exact calculations WebXPRT uses to produce test scores.

We hope that the XPRT white paper library will prove to be a useful resource for you. If you have questions about any of our white papers, or suggestions for topics that you’d like us to cover in possible future white papers, please let us know!

Justin

The role of potential WebXPRT 4 auxiliary workloads

As we mentioned in our most recent blog post, we’re seeking suggestions for ways to improve WebXPRT 4. We’re open to the prospect of adding both non-workload features and new auxiliary tests, e.g., a battery life or WebGPU-based graphics test scenario.

To prevent any confusion among WebXPRT 4 testers, we want to reiterate that any auxiliary workloads we might add will not affect existing WebXPRT 4 subtest or overall scores in any way. Auxiliary tests would be experimental or targeted workloads that run separately from the main test and produce their own scores. Current and future WebXPRT 4 results will be comparable to one another, so users who’ve already built a database of WebXPRT 4 scores will not have to retest their devices. Any new tests will be add-ons that allow us to continue expanding the rapidly growing body of published WebXPRT 4 test results while making the benchmark even more valuable to users overall.

If you have any thoughts about potential browser performance workloads, or any specific web technologies that you’d like to test, please let us know.

Justin

How we evaluate new WebXPRT workload proposals

A key value of the BenchmarkXPRT Development Community is our openness to user feedback. Whether it’s positive feedback about our benchmarks, constructive criticism, ideas for completely new benchmarks, or proposed workload scenarios for existing benchmarks, we appreciate your input and give it serious consideration.

We’re currently accepting ideas and suggestions for ways we can improve WebXPRT 4. We are open to adding both non-workload features and new auxiliary tests, which can be experimental or targeted workloads that run separately from the main test and produce their own scores. You can read more about experimental WebXPRT 4 workloads here. However, a recent user question about possible WebGPU workloads has prompted us to explain the types of parameters that we consider when we evaluate a new WebXPRT workload proposal.

Community interest and real-life relevance

The first two parameters we use when evaluating a WebXPRT workload proposal are straightforward: are people interested in the workload and is it relevant to real life? We originally developed WebXPRT to evaluate device performance using the types of web-based tasks that people are likely to encounter daily, and real-life relevancy continues to be an important criterion for us during development. There are many technologies, functions, and use cases that we could test in a web environment, but only some of them are both relevant to common applications or usage patterns and likely to be interesting to lab testers and tech reviewers.

Maximum cross-platform support

Currently, WebXPRT runs in almost any web browser, on almost any device that has a web browser, and we would ideally maintain that broad level of cross-platform support when introducing new workloads. However, technical differences in the ways that different browsers execute tasks mean that some types of scenarios would be impossible to include without breaking our cross-platform commitment.

One reason that we’re considering auxiliary workloads with WebXPRT, e.g., a battery life rundown, is that those workloads would allow WebXPRT to offer additional value to users while maintaining the cross-platform nature of the main test. Even if a battery life test ran on only one major browser, it could still be very useful to many people.

Performance differentiation

Computer benchmarks such as the XPRTs exist to provide users with reliable metrics that they can use to gauge how well target platforms or technologies perform certain tasks. With a broadly targeted benchmark such as WebXPRT, if the workloads are so heavy that most devices can’t handle them, or so light that most devices complete them without being taxed, the results will have little to no use for OEM labs, the tech press, or independent users when evaluating devices or making purchasing decisions.

Consequently, with any new WebXPRT workload, we try to find a sweet spot in terms of how demanding it is. We want it to run on a wide range of devices—from low-end devices that are several years old to brand-new high-end devices and everything in between. We also want users to see a wide range of workload scores and resulting overall scores, so they can easily grasp the different performance capabilities of the devices under test.

Consistency and replicability

Finally, workloads should produce scores that consistently fall within an acceptable margin of error, and are easily to replicate with additional testing or comparable gear. Some web technologies are very sensitive to uncontrollable or unpredictable variables, such as internet speed. A workload that measures one of those technologies would be unlikely to produce results that are consistent and easily replicated.

We hope this post will be useful for folks who are contemplating potential new WebXPRT workloads. If you have any general thoughts about browser performance testing, or specific workload ideas that you’d like us to consider, please let us know.

Justin

Check out the other XPRTs:

Forgot your password?