BenchmarkXPRT Blog banner

Category: HDXPRT development process

Looking deeper into results

A few weeks ago, I mentioned some questions we had about graphics performance using HDXPRT 2011 after releasing our results white paper. The issue was that HDXPRT 2011 gave results I had not expected—the integrated graphics outperformed discrete graphics cards. I suspected that this was both because HDXPRT 2011’s lack of 3D work lessens the advantage of discrete graphics cards and because the integrated graphics on the second-generation Intel Core processors we used performed well.

We ran some tests with discrete graphics cards on an older processor (an Intel Core 2 Quad processor Q6600) and report our findings in a second results white paper. My suspicions were correct: On the older processor, the discrete graphics cards performed 21 to 36 percent better than the integrated graphics.

As an aside, we are looking into putting our test results on the Web site in some easy-to-access fashion so you can look at them in more detail. My hope is that doing so will facilitate sharing of results among all of us in the HDXPRT Development Community.

Based on this second results white paper, I would love to hear your responses to two questions. First, do you think that future versions of HDXPRT should include 3D graphics? Second, what other areas of HDXPRT 2011 would you like to see us look into?

Bill

Comment on this post in the forums

Scoring with HDXPRT

Two weeks ago, I began explaining how benchmarks keep score (http://www.hdxprt.com/blog/2011/08/17/keeping-score/). HDXPRT 2011 fundamentally measures the time a PC required to complete a series of tasks, such as editing photos and converting videos from one format to another. It uses the times of three sets of tasks to come up with three use case times (Edit videos from your camcorder, Create memories from your digital camera, and Prepare media for on-the-go). Because an early version of the benchmark took too long to run, we trimmed the size of the workloads (such as the number of photos) to make it complete more quickly. Because we believed the size of the original workloads was realistic, we extrapolated (multiplied by the difference in size) what the time would have been. That process results in times in minutes.

We could have simply combined the three times into one total time, but doing so would have created a score where smaller is better, which can be confusing. To avoid this, HDXPRT 2011 normalizes the three times to the times a calibration, or base, system required to complete the same work. The benchmark then calculates a geometric mean of those three normalized scores and multiplies that number by 100 to create the overall Create HD Score. This scoring method sets the calibration system’s score to 100 and makes it easy for you to compare multiple systems. For example, if PC A gets a score of 200, and PC B gets a 400, PC B is twice the speed of PC A (and four times the speed of the calibration system) at creating HD content.

The term “geometric mean” might be unfamiliar. One way to get benchmark geeks arguing is to ask about the correct mean for combining results. (Yes, there really are enough of us for an argument.) At the risk of inflaming my fellow benchmark geeks, I will give a quick summary of the main ways people combine results.

An arithmetic mean is a simple average, where you add all the numbers and divide by the number of numbers. It is good for combining amounts, such as gigabytes of RAM, across multiple computers.

A geometric mean is more mathematically complex. You compute it by multiplying all the numbers and then taking the nth root, where n is the number of numbers. This kind of mean is appropriate for combining normalized numbers. Its advantage over the arithmetic mean is that it keeps one really good number from drowning out all the others.

The final mean is the harmonic. You calculate it by dividing the number of numbers by the sum of 1 divided by the square of each element. (If that makes little sense to you, don’t worry about it!) The harmonic mean is appropriate for combining rates, such as megabytes per second.

I should also mention one other result from HDXPRT 2011, the Overall Play HD Experience score. This is a very different kind of score that uses one to five stars to indicate the quality of three HD video playbacks. HDXPRT uses mean opinion scores (MOS) based on smoothness of playback to compute these results. (I’ll discuss MOS in more detail in a future blog.) With this kind of score, a four-star rating is better than a two-star rating, but it is hard to say how much better. The MOS research indicates that people would rate the four-star playback as good and the two-star playback as poor, but you can’t say that one is twice as good as the other because the relationship is not linear.

What do you think of the metrics that HDXPRT 2011 provides? Are there others you would find more useful or meaningful? Your input is vital to improving the benchmark and making sure it does what you want it to do.

Bill

Comment on this post in the forums

Helping hands

We ran into a problem last week with HDXPRT 2011. Basically, it would fail when we installed it. One of the biggest problems for application-based benchmarks like HDXPRT 2011 is dealing with existing applications on the system. Even more difficult to account for are the many DLLs, drivers, and Registry settings that can collide between applications and different versions of the same application.

After a lot of effort, we found the problem was indeed a conflict between some of the pre-installed software on the system and the HDXPRT 2011 installer. We were able to narrow down which applications caused the problem and posted on the site some instructions for how to work around the issues. (For more details, log into the forum and then see http://www.hdxprt.com/forum/showthread.php?18-Troubleshooting-Installation-problems-on-Dell-Latitude-notebooks. You won’t be able to read that message if you’re not logged in.)

My hope is that if you run into issues with HDXPRT 2011, you’ll share them. And, share the workarounds you find as well! So, please let us know any tips, tricks, or issues you find with the benchmark by sending email to hdxprtsupport@hdxprt.com. The more we work together, the better we can make both HDXPRT 2011 and the future versions. Thanks!

Next week, we’ll return to looking at the results HDXPRT 2011 provides.

Bill

Comment on this post in the forums

What to do, what to do

When you set out to build an application-based benchmark like HDXPRT, you face many choices, but two are particularly important:  what applications do you run, and what functions do you perform in each application?

With HDXPRT the answers were straightforward, as they should be.

The applications we chose reflected a blend of market leaders, those providing emerging but important features, and the input from our community members.

The functions we perform in each application are ones that are representative of common uses of those programs—and that reflect the input of the community.

What’s so important here is the last clause of each of those paragraphs:  your input defines this benchmark.

As we finish off HDXPRT 2011 and then move to the 2012 version, we’ll begin the development cycle anew. When we do, if you want to make sure we choose the applications and functions that matter most to you, then participate, tell us what you want, let us hear your voice.  We will respond to all input, so though we can’t guarantee to accept all direction—after all, goals and desires sometimes conflict—we can guarantee that you will hear back from us and that we will explain the rationale for our decisions.

Mark Van Name

Comment on this post in the forums

An example of the community in action

Last week, I hosted a Webinar on HDXPRT. We’ll make a recording of it available on the site fairly soon. Multiple members attended. As I was going through the slides and discussing various aspects of the benchmark, a member asked about installing the benchmark from a USB key or a server. My response was the simple truth: we hadn’t considered that approach. As I then elaborated, we clearly should have thought about it, because those capabilities would be useful in just about every production lab out there, including ours here at PT. I concluded by saying that we’d look into it.

I’m not naming the member simply because with big companies I’m never sure if doing that will be good or will cause someone trouble, and I don’t want to cause hassle for anyone. He should, though, feel free to step forward and claim the well-deserved credit for the suggestion.

Less than a week after the Webinar, I’m happy to be able to report that the team has done more than look into these capabilities; it’s implemented them! So, the next Beta release, Beta 2, which we’ll be releasing any time now (maybe even before we post this blog entry), lets you install the benchmark from a network share or a USB key.

I know this is a relatively small thing, but I think it bears reporting because it is exactly the way the community should work. A member brought the benefits of his experience to bear in a great bit of feedback, and now the benchmark is better for it—and so are all of us who use it.

Keep the good ideas coming!

Mark Van Name

Comment on this post in the forums

Our community’s goal

Computer system performance evaluation has a long and complex history. Many of the earliest tests were simple, short code snippets, such as Whetstone, that did little more than give an indication of how fast a particular computer subsystem was able to operate. Unfortunately, such simple benchmarks quickly lost their value, in part because they were very crude measures, and in part because software tools on the things they were measuring could easily optimize for them. In some cases, a compiler could even recognize a test and “optimize” the code by simply producing the final result!

Over time, though, benchmarks have become more complex and more relevant. Whole organizations exist and have existed to build benchmarks. Notable ones include the Ziff-Davis Benchmark Operation (ZDBOp), which the Ziff-Davis computer magazines funded in the 1990s and which Mark and I ran; the Standard Performance Evaluation Corporation (SPEC), which its member companies fund and of which PT is a member; and the Business Applications Performance Corporation (BAPCo), which its member companies fund. Each of these organizations has developed widely used products, such as Winstone (ZDBOp), SPEC CPU (SPEC), and SYSmark (BAPCo). Each organization has also always faced challenges. In the case of ZDBOp, for example, Ziff Davis could no longer support the costs of developing its benchmarks, so they discontinued the group. SPEC continues to develop good benchmarks, but its process can sometimes yield years between versions.

The goal with HDXPRT and the HDXPRT Development Community (HDC) is to explore a new way to develop benchmarks. By utilizing the expertise and experience of a community of interested people, we hope to be able develop benchmarks in an open and collaborative environment while keeping them timely.

HDXPRT 2011 is the first test of this approach. We believe that it and subsequent versions of it, as well as other benchmarks, will give the industry a new model for creating world-class performance measurement tools.

If you’re not a member of the HDC, please consider joining us and helping define the future of performance evaluation.

Bill

Comment on this post in the forums

Check out the other XPRTs:

Forgot your password?