BenchmarkXPRT Blog banner

Category: HDXPRT 2011 results

Sharing results

A few weeks back, I wrote about different types of results from benchmarks. HDXPRT 2011’s primary metric is an overall score. One of the challenges of a score, unlike a metric such as minutes of battery life, is that it is hard to interpret without context. Is 157 a good score? The use of a calibration, or base, system helps a bit, because if that system has a score of 100, then a 157 is definitely better. Still, two scores do not give you a lot of context.

To help make comparisons easier, we are releasing a set of results from our testing at http://hdxprt.com/hdxprt2011results. With the results viewer we’ve provided, you can sort the results on a variety of fields and filter them for matching text. We’ve include results from our beta testing and our results white papers.

We’ll continue to add results, but we want to invite members of the HDXPRT Development Community to do the same. We would especially like to get any results you have published on your Web sites. Please submit your results using this link: http://www.hdxprt.com/forum/2011resultsubmit. We’ll give them a sanity check and then include them in the results viewer. Thanks!

Bill

Comment on this post in the forums

Looking deeper into results

A few weeks ago, I mentioned some questions we had about graphics performance using HDXPRT 2011 after releasing our results white paper. The issue was that HDXPRT 2011 gave results I had not expected—the integrated graphics outperformed discrete graphics cards. I suspected that this was both because HDXPRT 2011’s lack of 3D work lessens the advantage of discrete graphics cards and because the integrated graphics on the second-generation Intel Core processors we used performed well.

We ran some tests with discrete graphics cards on an older processor (an Intel Core 2 Quad processor Q6600) and report our findings in a second results white paper. My suspicions were correct: On the older processor, the discrete graphics cards performed 21 to 36 percent better than the integrated graphics.

As an aside, we are looking into putting our test results on the Web site in some easy-to-access fashion so you can look at them in more detail. My hope is that doing so will facilitate sharing of results among all of us in the HDXPRT Development Community.

Based on this second results white paper, I would love to hear your responses to two questions. First, do you think that future versions of HDXPRT should include 3D graphics? Second, what other areas of HDXPRT 2011 would you like to see us look into?

Bill

Comment on this post in the forums

Scoring with HDXPRT

Two weeks ago, I began explaining how benchmarks keep score (http://www.hdxprt.com/blog/2011/08/17/keeping-score/). HDXPRT 2011 fundamentally measures the time a PC required to complete a series of tasks, such as editing photos and converting videos from one format to another. It uses the times of three sets of tasks to come up with three use case times (Edit videos from your camcorder, Create memories from your digital camera, and Prepare media for on-the-go). Because an early version of the benchmark took too long to run, we trimmed the size of the workloads (such as the number of photos) to make it complete more quickly. Because we believed the size of the original workloads was realistic, we extrapolated (multiplied by the difference in size) what the time would have been. That process results in times in minutes.

We could have simply combined the three times into one total time, but doing so would have created a score where smaller is better, which can be confusing. To avoid this, HDXPRT 2011 normalizes the three times to the times a calibration, or base, system required to complete the same work. The benchmark then calculates a geometric mean of those three normalized scores and multiplies that number by 100 to create the overall Create HD Score. This scoring method sets the calibration system’s score to 100 and makes it easy for you to compare multiple systems. For example, if PC A gets a score of 200, and PC B gets a 400, PC B is twice the speed of PC A (and four times the speed of the calibration system) at creating HD content.

The term “geometric mean” might be unfamiliar. One way to get benchmark geeks arguing is to ask about the correct mean for combining results. (Yes, there really are enough of us for an argument.) At the risk of inflaming my fellow benchmark geeks, I will give a quick summary of the main ways people combine results.

An arithmetic mean is a simple average, where you add all the numbers and divide by the number of numbers. It is good for combining amounts, such as gigabytes of RAM, across multiple computers.

A geometric mean is more mathematically complex. You compute it by multiplying all the numbers and then taking the nth root, where n is the number of numbers. This kind of mean is appropriate for combining normalized numbers. Its advantage over the arithmetic mean is that it keeps one really good number from drowning out all the others.

The final mean is the harmonic. You calculate it by dividing the number of numbers by the sum of 1 divided by the square of each element. (If that makes little sense to you, don’t worry about it!) The harmonic mean is appropriate for combining rates, such as megabytes per second.

I should also mention one other result from HDXPRT 2011, the Overall Play HD Experience score. This is a very different kind of score that uses one to five stars to indicate the quality of three HD video playbacks. HDXPRT uses mean opinion scores (MOS) based on smoothness of playback to compute these results. (I’ll discuss MOS in more detail in a future blog.) With this kind of score, a four-star rating is better than a two-star rating, but it is hard to say how much better. The MOS research indicates that people would rate the four-star playback as good and the two-star playback as poor, but you can’t say that one is twice as good as the other because the relationship is not linear.

What do you think of the metrics that HDXPRT 2011 provides? Are there others you would find more useful or meaningful? Your input is vital to improving the benchmark and making sure it does what you want it to do.

Bill

Comment on this post in the forums

Always wanting to know more

I’m an engineer (computer science) by training, and as a consequence I’m always after more data.  More data means better understanding, which leads to better decision making.  We acquired a lot of data in the course of finishing our white paper on the characteristics of HDXPRT 2011.  Now, of course, I want even more.

The biggest area that I want to understand better is the graphics subsystem.  Our testing showed processor-integrated graphics out-performing discrete graphics cards.  That was not what I expected.  There seem to be two likely explanations.  The first is that since the workload of HDXPRT 2011 does not include 3D, discrete graphics cards are not that helpful to the benchmark’s applications.  Certainly, 3D performance plays more to the traditional strengths of discrete graphics cards.  The second likely explanation is that the integrated graphics on the second-generation Intel Core processors we used perform well.  A number of performance Web sites have noted the same thing since the debut of those processors.

The answer is probably a combination of the two.

To satisfy my data desires, we’re going to look further. We’ll start by testing on some older processors as well as some different graphics cards.  We’ll share our findings with you.

Please let us know any other characteristics of HDXPRT 2011 that you’d like us to explore in more depth.  I can’t guarantee we’ll be able to look at everything, but I know I always want to know more!

Bill

Comment on this post in the forums

Sneak peak at the HDXPRT 2011 results white paper

After spending weeks testing different configurations with HDXPRT 2011, we are putting the final touches on a white paper detailing the results. I thought I’d give you a sneak peak at some of the things the tests revealed about the characteristics of HDXPRT 2011.

As I explained last week, trying to understand the characteristics of a benchmark requires careful testing while changing one component at a time. To do that, we ran the tests on a single system using an Intel DH67BL motherboard. We changed processors (both type and speed), the amount of RAM, the type of storage (hard disk and SSD), and the graphics subsystem, as well as a few other variables.

Here are a few of the things we found:

  • Processor speed – On an Intel Core i3, increasing the processor speed (GHz) 6.5% resulted in a 4.4% increase in the HDXPRT overall score. On an Intel Core i5, increasing the processor speed (GHz) 17.9% resulted in an 8.1% increase in the HDXPRT overall score. Generally, that means that increased processor speed is important, but the performance scales somewhat less than the raw gigahertz.
  • Memory – Increasing from 2 GB to 4 GB increased the overall score 10.7% on an Intel Core i5 and 15.8% on an Intel Core i7. However, increasing from 4 GB to 8 GB increased the score less than 2% on both processors. These results map pretty well with my personal experience: going to 4 GB is important for media-rich applications, but going to 8 GB is less so.
  • Disk drive – Switching from a hard disk to an SSD increased the overall score about 1%. While I would certainly prefer an SSD to a hard disk, this shows that, for HDXPRT 2011, disk performance has only a small influence on the results.

Many more details will be in the white paper we will publish in the next few days. Please be on the lookout for it and let us know what you think of the results and what they say about the characteristics of HDXPRT 2011.

We plan to conduct a Webinar in the near future to discuss the HDXPRT 2011 results white paper and to answer general questions. I hope to see you there!

Bill

Comment on this post in the forums

Benchmarking a benchmark

One of the challenges of any benchmark is understanding its characteristics. The goal of a benchmark is to measure performance under a defined set of circumstances. For system-level, application-oriented benchmarks, it isn’t always obvious how individual components in the system influence the overall score. For instance, how does doubling the amount of memory affect the benchmark score? The best way to understand the characteristics of a benchmark is to run a series of carefully controlled experiments that change one variable at a time. To test the benchmark’s behavior with increased memory, you would take a system and run the benchmark with different amounts of RAM. Changing the processor, graphics subsystem, or hard disk lets you see the influence of those components. Some components, like memory, can change in both their amount and speed.

The full matrix of system components to test can quickly grow very large. While the goal is to change only one component at a time, this is not always possible. For example, you can’t change the processor from an Intel to an AMD without also changing the motherboard.

We are in the process of putting HDXPRT 2011 through a series of such tests. HDXPRT 2011 is a system-level, application-oriented benchmark for measuring the performance of PCs on consumer-oriented HD media scenarios. We want to understand, and share with you, how different components influence HDXPRT scores. We expect to release a report on our findings next week. It will include results detailing the effect of processor speed, amount of RAM, hard disk type, and graphics subsystem.

There is a tradeoff between the size of the matrix and how long it takes to produce the results. We’ve tried to choose the areas we felt were most important, but we’d like to hear what you consider important. So, what characteristics of HDXPRT 2011 would you like to see us test?

Bill

Comment on this post in the forums

Check out the other XPRTs:

Forgot your password?