BenchmarkXPRT Blog banner

Category: HDXPRT metrics

Principled Technologies and the BenchmarkXPRT Development Community release HDXPRT 4, a benchmark designed to show how well Windows devices handle real-world media tasks

Durham, NC, February 25 — Principled Technologies and the BenchmarkXPRT Development Community have released HDXPRT 4, a free benchmark that gives objective information about how well Windows 10 devices handle common media-creation tasks. HDXPRT 4 uses real commercial applications, like Photoshop and MediaEspresso, to perform tasks based on three everyday scenarios: photo editing, video conversion, and music editing. After the test is finished, the tool provides an overall measure by generating a single performance score. Anyone can go to HDXPRT.com to compare existing performance results on a variety of devices, or to download the app for themselves.

“When we started working on HDXPRT 4, we knew we wanted to create a benchmark that accurately reflects the kind of work average consumers do when creating content on their PCs,” said Bill Catchings, co-founder of Principled Technologies, which administers the BenchmarkXPRT Development Community. “HDXPRT delivers clear results that make sense to the wide audience of buyers shopping for new Windows systems.”

HDXPRT is part of the BenchmarkXPRT suite of performance evaluation tools, which includes WebXPRT, MobileXPRT, TouchXPRT, CrXPRT, and BatteryXPRT. The XPRTs help users get the facts before they buy, use, or evaluate tech products such as computers, tablets, and phones.

To learn more about the BenchmarkXPRT Development Community, go to www.BenchmarkXPRT.com.

About Principled Technologies, Inc.
Principled Technologies, Inc. is a leading provider of technology marketing, as well as learning and development services. It administers the BenchmarkXPRT Development Community.

Principled Technologies, Inc. is located in Durham, North Carolina, USA. For more information, please visit www.PrincipledTechnologies.com.

Company Contact
Justin Greene
BenchmarkXPRT Development Community
Principled Technologies, Inc.
1007 Slater Road, Ste. 300
Durham, NC 27704

BenchmarkXPRTsupport@PrincipledTechnologies.com

HDXPRT 4 is here!

We’re excited to announce that HDXPRT 4 is now available to the public! Just like previous versions of HDXPRT, HDXPRT 4 uses trial versions of commercial applications to complete real-world media tasks. The HDXPRT 4 installation package includes installers for some of those programs, such as Audacity and HandBrake. For other programs, such as Adobe Photoshop Elements and CyberLink Media Espresso, users will need to download the necessary installers prior to testing by using the links and instructions in the HDXPRT 4 User Manual.

In addition to the editing photos, editing music, and converting videos workloads from prior versions of the benchmark, HDXPRT 4 includes two new Photoshop Elements scenarios. The first utilizes an AI tool that corrects closed eyes in photos, and the second creates a single panoramic photo from seven separate photos.

HDXPRT 4 is compatible with systems running Windows 10, and is available for download at HDXPRT.com. The installation package is about 4.8 GB, so the download may take several minutes. The setup process takes about 30 minutes on most computers, and a standard test run takes approximately an hour.

After trying out HDXPRT 4, please submit your scores here and send any comments to BenchmarkXPRTsupport@principledtechnologies.com. To see test results from a variety of systems, go to HDXPRT.com and click View Results, where you’ll find scores from a variety of devices. We look forward to seeing your results!

Working towards Windows 8

This past Wednesday, Bill hosted a Webinar to discuss HDXPRT 2012. He covered a lot of material. We’ll make a recording of it available on the site fairly soon.

During the Webinar, Bill mentioned that we’re working on a patch to let HDXPRT run on Windows 8. We have begun testing this patch. However, given the high level of interest in the community about testing HDXPRT on Windows 8, we are going to offer the patch on Friday to any community members that want to try it on an “as is” basis.

Using the patch is straightforward. Having installed HDXPRT on a Windows 8 system, you copy a few files to the HDXPRT\Bin folder, run a DOS script, and reboot. At that point, HDXPRT should run on the Windows 8 system. We will include detailed instructions with the download.

The patch should have no impact on the scores. This means you can compare results from Windows 8 systems with the results you already have from Windows 7 systems.

We hope that you will try HDXPRT on Windows 8 and let us know what you see. We’ll use your feedback as we finalize the update of HDXPRT 2012 that will fully support Windows 8.

When the update is available, we’ll post to the community forum, tweet, and put a notice on the Web page.

In other news, there’s a post on the forum that gives instructions for getting more detailed timing information from HDXPRT. Community members can read that post here: How to get more detailed timing information from HDXPRT 2012

Finally, the comment period for HDXPRT 2013 starts October 1. Be thinking about what you’d like to see in HDXPRT 2013!

Eric

Update: The prerelease Patch for Windows 8 is now available. You can download it here.

Comment on this post in the forums

What to do with all the times

HDXPRT, like most other application-based benchmarks, works by timing lots of individual operations. Some other benchmarks just time the entire script. The downside of that approach is that the time includes things that are constant regardless of the speed of the underlying hardware. Some things, like how fast a menu drops down or text scrolls, are tied to the user experience and should not go faster on a faster system. Including those items in the overall time dilutes the importance of the operations that we wait on and are frustrated by, the operations we need to time.

In the case of HDXPRT 2011, we time between 20 and 30 operations. We then roll these up into the times we report as well as the overall score. We do not, however, report the individual times. We expect to include even more timed operations in HDXPRT 2012. As we have been thinking about what the right metrics are, we have started to wonder what to do with all of those times. We could total up the times of similar operations and create additional results. For example, we could total up all the application load times and produce an application-load result. Or, we could total up all the times for an individual application and produce an application result. I can definitely see value in results like those.

Another possibility is to try and look at the general pattern of the results to understand responsiveness. One way would be to collect the times in a histogram, where buckets correspond to ranges of response times for the operations. Such a histogram might give a sense of how responsive a target system feels to an end user. There are certainly other possibilities as well.

If nothing else, I think it makes sense to expose these times in some way. If we make them available, I’m confident that people will find ways to use them. My concern is the danger of burdening a benchmark with too many results. The engineer in me loves all the data possible. The product designer knows that focus is critical. Successful benchmarks have one or maybe two results. How to balance the two?

One wonder of this benchmark development community is the ability to ask you what you think. What would you prefer, simple and clean or lots of numbers? Maybe a combination where we just have the high-level results we have now, but also make other results or times available in an “expert” or an “advanced” mode? What do you think?

Bill

Comment on this post in the forums

Sharing results

A few weeks back, I wrote about different types of results from benchmarks. HDXPRT 2011’s primary metric is an overall score. One of the challenges of a score, unlike a metric such as minutes of battery life, is that it is hard to interpret without context. Is 157 a good score? The use of a calibration, or base, system helps a bit, because if that system has a score of 100, then a 157 is definitely better. Still, two scores do not give you a lot of context.

To help make comparisons easier, we are releasing a set of results from our testing at http://hdxprt.com/hdxprt2011results. With the results viewer we’ve provided, you can sort the results on a variety of fields and filter them for matching text. We’ve include results from our beta testing and our results white papers.

We’ll continue to add results, but we want to invite members of the HDXPRT Development Community to do the same. We would especially like to get any results you have published on your Web sites. Please submit your results using this link: http://www.hdxprt.com/forum/2011resultsubmit. We’ll give them a sanity check and then include them in the results viewer. Thanks!

Bill

Comment on this post in the forums

Scoring with HDXPRT

Two weeks ago, I began explaining how benchmarks keep score (http://www.hdxprt.com/blog/2011/08/17/keeping-score/). HDXPRT 2011 fundamentally measures the time a PC required to complete a series of tasks, such as editing photos and converting videos from one format to another. It uses the times of three sets of tasks to come up with three use case times (Edit videos from your camcorder, Create memories from your digital camera, and Prepare media for on-the-go). Because an early version of the benchmark took too long to run, we trimmed the size of the workloads (such as the number of photos) to make it complete more quickly. Because we believed the size of the original workloads was realistic, we extrapolated (multiplied by the difference in size) what the time would have been. That process results in times in minutes.

We could have simply combined the three times into one total time, but doing so would have created a score where smaller is better, which can be confusing. To avoid this, HDXPRT 2011 normalizes the three times to the times a calibration, or base, system required to complete the same work. The benchmark then calculates a geometric mean of those three normalized scores and multiplies that number by 100 to create the overall Create HD Score. This scoring method sets the calibration system’s score to 100 and makes it easy for you to compare multiple systems. For example, if PC A gets a score of 200, and PC B gets a 400, PC B is twice the speed of PC A (and four times the speed of the calibration system) at creating HD content.

The term “geometric mean” might be unfamiliar. One way to get benchmark geeks arguing is to ask about the correct mean for combining results. (Yes, there really are enough of us for an argument.) At the risk of inflaming my fellow benchmark geeks, I will give a quick summary of the main ways people combine results.

An arithmetic mean is a simple average, where you add all the numbers and divide by the number of numbers. It is good for combining amounts, such as gigabytes of RAM, across multiple computers.

A geometric mean is more mathematically complex. You compute it by multiplying all the numbers and then taking the nth root, where n is the number of numbers. This kind of mean is appropriate for combining normalized numbers. Its advantage over the arithmetic mean is that it keeps one really good number from drowning out all the others.

The final mean is the harmonic. You calculate it by dividing the number of numbers by the sum of 1 divided by the square of each element. (If that makes little sense to you, don’t worry about it!) The harmonic mean is appropriate for combining rates, such as megabytes per second.

I should also mention one other result from HDXPRT 2011, the Overall Play HD Experience score. This is a very different kind of score that uses one to five stars to indicate the quality of three HD video playbacks. HDXPRT uses mean opinion scores (MOS) based on smoothness of playback to compute these results. (I’ll discuss MOS in more detail in a future blog.) With this kind of score, a four-star rating is better than a two-star rating, but it is hard to say how much better. The MOS research indicates that people would rate the four-star playback as good and the two-star playback as poor, but you can’t say that one is twice as good as the other because the relationship is not linear.

What do you think of the metrics that HDXPRT 2011 provides? Are there others you would find more useful or meaningful? Your input is vital to improving the benchmark and making sure it does what you want it to do.

Bill

Comment on this post in the forums

Check out the other XPRTs:

Forgot your password?