BenchmarkXPRT Blog banner

Category: Application-based benchmarks

Benchmarking a benchmark

One of the challenges of any benchmark is understanding its characteristics. The goal of a benchmark is to measure performance under a defined set of circumstances. For system-level, application-oriented benchmarks, it isn’t always obvious how individual components in the system influence the overall score. For instance, how does doubling the amount of memory affect the benchmark score? The best way to understand the characteristics of a benchmark is to run a series of carefully controlled experiments that change one variable at a time. To test the benchmark’s behavior with increased memory, you would take a system and run the benchmark with different amounts of RAM. Changing the processor, graphics subsystem, or hard disk lets you see the influence of those components. Some components, like memory, can change in both their amount and speed.

The full matrix of system components to test can quickly grow very large. While the goal is to change only one component at a time, this is not always possible. For example, you can’t change the processor from an Intel to an AMD without also changing the motherboard.

We are in the process of putting HDXPRT 2011 through a series of such tests. HDXPRT 2011 is a system-level, application-oriented benchmark for measuring the performance of PCs on consumer-oriented HD media scenarios. We want to understand, and share with you, how different components influence HDXPRT scores. We expect to release a report on our findings next week. It will include results detailing the effect of processor speed, amount of RAM, hard disk type, and graphics subsystem.

There is a tradeoff between the size of the matrix and how long it takes to produce the results. We’ve tried to choose the areas we felt were most important, but we’d like to hear what you consider important. So, what characteristics of HDXPRT 2011 would you like to see us test?

Bill

Comment on this post in the forums

Putting HDXPRT in some benchmark context

Benchmarks come in many shapes and sizes.  Some are extremely small, simple, and focused, while others are large, complex, and cover many aspects of a system.  To help position HDXPRT in the world of benchmarks, let me share with you a little taxonomy that Bill and I have long used.  No taxonomy is perfect, of course, but we’ve found this one to be very helpful as a general categorization tool.

From the perspective of how benchmarks measure performance, you can divide most of them into three groups.

Inspection tools use highly specialized tests to target very particular parts of a system. Back in the day, lo these many decades ago—okay, it was only two decades, but in dog years two tech decades is like five generations—some groups used a simple no-op loop to measure processor performance. I know, it sounds dumb today, but for a short time many felt it was a legitimate measure of processor clock speed, which is one aspect of performance. Similarly, if you want to know how fast a graphics subsystem could draw a particular kind of line, you could write code to draw lines of that type over and over.

These tools have very limited utility, because they don’t do what real users do, but for people working close to hardware, they can be useful.

Moving closer to the real world, synthetic benchmarks are specially written programs that simulate the kinds of work their developers believe real users are doing. So, if you think your target users are spending all day in email, you could write your own mini email client and time functions in it.  These tools definitely move closer to real user work than inspection tools, but they still have the drawback of not actually running the programs real people are using.

Application-based benchmarks take that last step by using real applications, the same programs that users employ in the real world. These benchmarks cause those applications to perform the kinds of actions that real users take, and they time those actions.  You can always argue about how representative they are—more on that in a future blog entry, assuming I don’t forget to write it—but they are definitely closer to the real world because they’re using real applications.

With all of that background, HDXPRT becomes easy to classify:  it’s an application-based benchmark.

Mark Van Name

Comment on this post in the forums

Check out the other XPRTs:

Forgot your password?