BenchmarkXPRT Blog banner

Category: Benchmarking

Counting down

We’ve been hard at work since the end of the beta period, driving toward the release of HDXPRT 2012. Things are looking good. The RTM is coming soon, so we thought we’d share the next few milestones with you.

  • RTM candidate:  7/13/12. At this point, we stop development. There are no feature changes after this point.
  • HDXPRT 2012 launch: 7/27/12. Having tested the RTM and manufactured the DVDs, we mail the benchmark to the community members. This is when the press release goes out. It’s also when we publish the HDXPRT 2012 white paper. Unlike the design document, this paper will explain HDXPRT 2012 to the general public.
  • Webinar: 8/3/28 – We talk about HDXPRT 2012 and take your questions.
  • First scaling paper: 8/31/12. As with HDXPRT 2011, we will be publishing a set of scaling studies. The first will test several modern desktop processors, looking at factors such as varying the amount of RAM, comparing HDDs to SSDs, and comparing internal graphics to a couple of popular graphics cards.
  • Second scaling paper: 9/28/12. In this paper, we expand on the testing we did in the first scaling paper.

 

We will release an update of HDXPRT 2012 that supports Windows 8 and includes bug fixes within a month of Windows 8 launch.

We’re very excited about the release of HDXPRT 2012, and look forward to seeing what you do with it!

Eric

Comment on this post in the forums

Tablets everywhere

Everyone wants in on the tablet market. This month, two software vendors have announced hardware tablets—Microsoft’s Surface and Google’s Nexus 7. Both vendors in the past relied on OEMs to create tablets using their software (Windows and Android). Both have met with limited success doing so.

Now, both are trying Apple’s strategy of controlling the hardware as well the software. Unlike Apple, however, Microsoft and Google still need to work with their OEM partners. I’m looking forward to watching that delicate dance!

I’m looking forward more, however, to being able to actually play with both of those products. I’m also looking forward to using TouchXPRT on such products. We have not given you an update in a while on TouchXPRT, but rest assured that we are hard at work on it. Once we have HDXPRT 2012 ready to go, we will give you more details on where we are with TouchXPRT and its current schedule. The touch and tablet market are heating up and we plan to be there for it. As we have indicated before, we will support Windows 8 Metro in the first version, but we see a real need for TouchXPRT to work on multiple platforms. So much to do!

Please note that today is the end of the beta test period. We appreciate the results, bugs, and suggestions you have sent so far. Feel free, however, to continue to send us any feedback or issues you find even after the official beta period is complete. After today, we can’t guarantee to be able to address them, but we will try.

Bill

Comment on this post in the forums

Our new baby has a name!

At the beginning of the year, at CES, we announced that we would start working on a touch-based benchmark that would initially run on Windows 8 Metro. We have been hard at work learning about Metro and creating the benchmark itself.

In parallel, we’ve been working on a name for the benchmark. What we settled on was Touch eXperience & Performance Ratings Tool, or TouchXPRT for short. We’re updating the Web pages with the new name and getting the domain properly set up. In the meantime, check out the logo:

Let us know what you think about the name and the logo. We are happy with both!

I’ve been reading that the Windows 8 beta should be available soon and we hope to have an alpha TouchXPRT available within a few weeks of the beta. We will need your help to critique, debug, and expand TouchXPRT from there. Hang onto your hats, these are exciting times in Benchmark Land!

Bill

Comment on this post in the forums

The real art of benchmarking

In my last blog entry, I noted the challenge of balancing real-world and real-science considerations when benchmarking Web page loads. That issue, however, is inherent in all benchmarking. Real world argues for benchmarks that emphasize what users and computers actually do. For servers, that might mean something like executing real database transactions against a real database from real client computers. For tablets, that might mean real fingers selecting and displaying real photos. There are obvious issues with both—setting up such a real database environment is difficult and who wants to be the owner of the real fingers driving the tablet? It is also difficult to understand what causes performance differences—is it the network, the processors, or the disks in the server? There are also more subtle challenges, such as how to make the tests work on servers or tablets other than the original ones. Worse, such real-world environments are subject to all sorts of repeatability and reproducibility issues.

Real science, on the other hand, argues for benchmarks that emphasize repeatable and reproducible results. Further, real science wants benchmarks that isolate the causes of performance differences. For servers, that might mean a suite of tests targeting processor speed, network bandwidth, and disk transfer rate. For tablets, that might mean tests targeting processor speed, touch responsiveness, and graphics-rendering rate. The problem is that it is not always obvious what combination of such factors actually delivers better database server performance or tablet experience. Worse, it is possible that testing different databases and transactions would result in very different characteristics that these tests don’t at all measure.

The good news is that real world and real science are not always in opposition. The bad news is that a third factor exacerbates the situation—benchmarks take real time (and of course real money) to develop. That means benchmark developers need to make compromises if they want to bring tests to market before the real world they are attempting to measure has changed. And, they need to avoid some of the most difficult technical hurdles. Like most things, that means trying to find the right balance between real world and real science.

Unfortunately, there is no formula for determining that balance. Instead, it really is somewhat of an art. I’d love to hear from you some examples of benchmarks (current or from the past) that you think do a good job implementing this balance and showing the real art of benchmarking.

Bill

Comment on this post in the forums

Web benchmarking challenges

I think that an important part of any touch benchmark will be a Web component. After all, the always (or almost always) connected nature ofthese devices is a critical part of their identities. I think such a Web benchmark needs to include a measurement of page load speed (how long it takes to download and render a page).

Creating such a test seems straightforward. Pick a set of sites, such as the five or ten most popular, and then time how long the home page of each takes to load. The problem, however, is that those pages are constantly changing. Every few months, most popular sites do a major redesign. That would obviously affect the results for a test and make it difficult to compare the results of a current test to one from a few months back. It is even more of a problem that the page will be different for one user than another as sites typically know things like where you are and what your computer is and adjust things to match those characteristics. And, the ads and the content of the site are constantly changing and updating. Even hitting Refresh on a page can give you different page.

Given all of those problems, how is it possible to test page loads? One way is to create pages that are similar those of leading Web sites in terms of things like size, amount of graphics, and dynamic elements. This allows the tests to be consistent over time and from different devices and locations. (Or, at least, as consistent as the variability of the Internet from moment to moment allows.) The problem with this approach, however, is that the pages will age out as Web sites update themselves and they will not be the real sites.

Such are the tradeoffs in benchmarking. The key is how to balance real science with real world considerations. What do you think? Which approach is the better balance of real science and real world?

Bill

Comment on this post in the forums

An open, top-down process

We’ve been hard at work putting together the RFC for HDXPRT 2012. As a group of us sat around a table discussing what we’d like to see in the benchmark, it became clear to me how different this development process is from those of other benchmarks I’ve had a hand in creating (3D WinBench, Winstone, WebBench, NetBench, and many others.). The big difference is not in the design or the coding or even the final product.

The difference is the process.

A sentiment that came up frequently in our meeting was “Sure, but we need to see what the community thinks.” That indicates a very different process than I am used to. Different from what companies developing benchmarks do and different from what benchmark committees do. What it represents, in a word, is openness. We want to include the Development Community in every step of the process, and we want to figure out how to make the process even more open over time. For example, we discussed ideas as radical as videoing our brainstorming sessions.

Another part of the process I think is important is that we are trying to do things top-down. Rather than deciding which applications should be in the benchmark, we want to start by asking how people really use high-definition media. What do people typically do with video? What do they do to create it and how do they watch it? Similarly, what do people do with images and audio?

At least as importantly, we don’t want to include only our opinions and research on these questions; we want to pick your brains and get your input. From there, we will work on the workflows, the applications, and the RFC. Ultimately, that will lead to the scripts themselves. With your input and help, of course!

Please let us know any ideas you have for how to make the process even more open. And tell us what you think about this top-down approach. We’re excited and hope you are, too!

Bill

Comment on this post in the forums

Check out the other XPRTs:

Forgot your password?