Enterprise
End user
3D graphics and rendering
AI
Application verification
Battery Life
Big data
Compress file
CPU performance
Data protection
Durability
Feature comparison
File transfer
Gaming performance
IoT
Microsoft 365 scenarios
Mixed workload
Multi-task scenario
NoSQL database
Office productivity
Overall system performance
Photo workloads
Relational database
Shock durability
Surface temperature
Synthetic compute
Synthetic IO
Synthetic networking
System decibel levels
User focused
VDI
Video workloads
Web browsing
Data warehouse
Decision support
Document-based
Image recognition/
Image segmentation/
Key-value store
Natural language processing
OLTP (transactional)
Recommendation
Wide column-based
Android
Chrome OS
Hyper-V
iOS
Linux
Mac OS
VMware
Windows
Windows 10
Adobe Acrobat Pro DC
Adobe After Effects
Adobe Lightroom Classic
Adobe Photoshop
Adobe Photoshop Elements 2020
Adobe Premier Pro
Aerospike Database
Audacity
AutoIT
Azure Cosmos
Backup software
Basemark
Blender
Cassandra
CyberLink MediaEspresso 7.5
Corel WinZip Enterprise
Deus Ex: Mankind Divided
DynamoDB
Google Chrome
Hadoop
HandBrake
Hbase
Heaven Benchmark
HTML5
JavaScript
Local video playback
Maxon Cinema 4D
Maxon Redshift
Microsoft Excel
Microsoft Outlook
Microsoft Powerpoint
Microsoft Word
Models real-world applications
MongoDB
MySQL
Netflix streaming
Nginx
ONNX runtime
Onshape
Oracle
PostgreSQL
pytorch
Redis
Shadow of the Tomb Raider
Shortcut
Sid Meier's Civilization VI
SQL Server
Topaz Video AI
Tensorflow
Valley Benchmark
Varies
Video conferencing
VMware Horizon
Web assembly
WordPress
3D Mark
AIXPRT
Basemark
Benchcraft
Blender
Built-in
Cinebench
CrXPRT
CrossMark
Custom scripts
Deus Ex: Mankind Divided
DISKSPD
DS2/DS3
FIO
Frametest
GeekBench
HammerDB
HDXPRT
Heaven Bench
HiBench
Iometer
iPerf
LAMMPS
Login Enterprise
MLPerf
MobileMark
MobileXPRT
Model Zoo
Onshape
oss-performance
Passmark
PCMark
Procyon
Proprietary
PugetBench
Redshift
Shadow of the Tomb Raider
Sid Meier's Civilization VI
SLOB
SparkBench
SPECint
SPECviewperf
SPECworkstation
Speedometer
Speedtest
SYSmark
Time and steps
TouchXPRT
Varies
Valley Benchmark
VdBench
VMFleet
Weathervane
WebXPRT
YCSB
Battery Life
Disk (general)
Disk (reads)
Disk (writes)
Disk (throughput)
CPU
GPU
Network
RAM
TBD
All
Non-clustered
Clustered (scale-out)
All
Android Mobile device
Bare metal
Chromebooks
Container
PC
Virtual
Windows 10 PC
Workstation PC
3DMark
Acoustic testing
Aerospike Database
AI/ML testing: Image classification
AI/ML testing: Image segmentation
AI/ML testing: Language processing
AI/ML testing: LLM testing with PTChatterly
AI/ML testing: Object detection
AI/ML testing: Recommendations
AI/ML testing: Speech recognition
Analytics with TPROC-H
Application verification
Backup and recovery studies (end-user)
Backup and recovery studies (enterprise)
Basemark GPU
Battery life testing
Bayes (HiBench)
Benchcraft
Blender
Cinebench
CrossMark
CrXPRT
Data reduction testing
Decision support workload
Deployment studies (on end-user devices)
Deployment studies (enterprise)
DISKSPD
Durability testing (end-user)
Durability testing (enterprise)
Feature comparison
Frametest
Geekbench
HDXPRT
I/O testing with FIO
I/O testing with Iometer
In-game benchmarks
Intel Edge Insights for Industrial
iPerf
Jetstream
K-means
LAMMPS
Manageability studies
Maxon Redshift
Microphone quality testing
Migration studies
MobileMark
MobileXPRT
NoSQL with YCSB
OLTP with DVD Store
OLTP with TPROC-C
Onshape
PassMark
PCMark
Power efficiency testing
Procyon benchmark suite
PugetBench
Random Forest
SLOB
Spark LDA
Spark LR
Speaker loudness & quality testing
SPECfp
SPECint
SPECviewperf
SPECworkstation
Speedometer
Speedtest
SYSmark
Task-based time testing
TeraSort
Thermal testing (end-user)
Thermal testing (enterprise)
Thin client testing
TouchXPRT
Unigine benchmarks
VM and container density testing
VMFleet
Vdbench and HCIBench
VDI testing with Login Enterprise
Video AI
Video encoding
Weathervane
WebXPRT
WordCount
WordPress
See more examples of our hands-on testing
Principled Technologies is more than a name: Those two words power all we do. Our principles are our north star, determining the way we work with you, treat our staff, and run our business. And in every area, technologies drive our business, inspire us to innovate, and remind us that new approaches are always possible.
OLTP with TPROC-C
Simulate transaction processing
The TPROC-C benchmark runs a transaction processing workload that uses five types of transactions: receiving a customer order, recording a payment, delivering an order, checking an order’s status, and checking stock in inventory.1 Though the types of orders in the benchmark mimic the everyday work of a warehouse, results from TPROC-C are useful across many industries.
The tool outputs its results in new orders per minute, or NOPM.
HammerDB derived this workload from the specifications for the TPC-C benchmark. TPROC-C is not, however, a full implementation of official TPC-C standards, so any TPROC-C results are not directly comparable to published TPC-C results
Analytics with TPROC-H
Connect to your customers’ data analysis needs
Per HammerDB, the creator of this workload and the harness we use to run it, TPROC-H “represents the typical workload of a retailer running analytical queries about their operations.”1 Results from TPROC-H are, however, useful outside of retail environments. Any organization, from finance to healthcare and beyond, that runs data analytics or decision support workloads might find value in this data.
TPROC-H outputs results in terms of how long a system takes to complete sets of queries.
HammerDB derived this workload from the TPC-H benchmark specifications, but it is not a full implementation of official TPC-H standards. Consequently, TPROC-H results are not directly comparable to published TPC-H results.
OLTP with DVD Store
Model real-world ecommerce systems
The DVD Store benchmark simulates an online store, including customers logging in, browsing and purchasing products, reviewing products, browsing and rating reviews, and more.1 Though this simulated store sells DVDs, results could apply for ecommerce operations of all types—and indeed, for organizations relying on OLTP workloads. Our team has experience running both the latest version, DVD Store 3 (DS3), and the previous version, DVD Store 2 (DS2). Both versions of the benchmark output results in orders per minute, or OPM.
NoSQL testing with YCSB
Measure performance for big data workloads
YCSB, or the Yahoo! Cloud Serving Benchmark, is a benchmark utility/framework that focuses on various NoSQL databases, including document-based databases such as MongoDB and Azure Cosmos; key-value store databases such as Aerospike, Redis, and DynamoDB; and wide column-based databases such as Apache Cassandra and Hbase. The utility provides six types of workloads that perform operations on data; I/O patterns are the primary differences among the workloads.1 Depending on the database systems we are testing and the goals of the study, we may use different workloads within the tool. YCSB outputs results in operations per second, or OPS.
VDI testing with Login Enterprise
Measure user count, responsiveness, & more in VDI environments
As the pandemic has fueled an increase in remote and hybrid work environments, virtual desktop infrastructure (VDI) has become even more critical for many companies. Login Enterprise, a tool from Login VSI, simulates VDI users with a variety of personas. With it, we can test how many users a VDI solution can handle and what level of responsiveness those users would see. Login Enterprise can output multiple results, including how long it takes for users to log in, how many users the solution under test can support, and a user experience score.
Note that due to the Login Enterprise licensing requirements, any project with this testing must include funding for a Login Enterprise license or access to an existing license our client owns.
AI/ML testing: Image classification
Measure AI image classification performance
In AI image classification use cases, organizations utilize artificial intelligence to identify objects in images or categorize an image based on elements inside that image. You might turn to an image classification workload to assist with identifying an issue in a medical scan, moderating image content in a chatroom, or conducting facial recognition on a security camera’s feed, among other use cases.
Multiple models are available for image classification, among them ResNet-50 and MobileNet. We’re happy to assess the training and/or inference performance of your solution using either model. In the past, we’ve tested with multiple benchmarks that make use of one or both of these models, including MLPerf’s image-classification tests and AIXPRT. Though each benchmark works differently, the output is typically the number of images, frames, or queries per second that the solution under test is able to process. We’ve worked with common frameworks such as TensorFlow and Intel Model Zoo.
For some platforms, we may need to incorporate code development time in the scope of the project; contact us to learn more.
AI/ML testing: Recommendations
Quantify AI recommendation performance
AI recommendation models make use of varying sources of data—such as user information, behavioral patterns, and interactions between users—to deliver personalized ads, product recommendations, or other results in e-commerce and social media settings.1,2 For example, an AI recommendation model for an online retailer might analyze customer click data and search history to infer other products in which that customer might be interested.
These models include Wide & Deep, a machine learning workload useful for retail or e-commerce applications, and the deep learning recommendation model, or DLRM, which Facebook created to recommend advertisements that the model suggested would have the maximum likelihood of a user clicking on them. We’re able to assess training and/or inference performance on these workloads and others, working with common frameworks such as MLperf and Intel Model Zoo.
For some platforms, we may need to incorporate code development/optimization time in the scope of the project; contact us to learn more.
AI/ML testing: Language processing
Measure performance for natural language processing
Natural language processing, or NLP, is one of the best-known forms of AI in 2024 thanks to the OpenAI tool ChatGPT. ChatGPT, a free AI chatbot based on a large language model (LLM), is an NLP tool in use by 100 million people per week as of November 2023.1 It’s far from the only language processing model, however. NLP models can make predictions and analyses, answer questions, translate text from one language to another, and much more. These tools have wide applications across every industry—imagine a retail chain with an intelligent online chatbot that can answer detailed customer questions about products, or a data-analytics firm with the ability to both translate and analyze documentation in multiple languages in real time.
There are a host of language processing models available, including the LLMs GPT-3, GPT-4, and the free open-source Llama 2 from Meta; the translation models Neural Machine Translation (NMT) and Transformer; and Bidirectional Encoder Representations from Transformers (BERT), among others. We can test a solution’s performance on these models in a variety of ways depending on the model and the goal; each approach would have different outputs.
For some platforms, we may need to incorporate code development/optimization time in the scope of the project. Contact us to learn more.
I/O testing with FIO
Customize a workload to measure drive and memory performance
Measuring disk activity—in either input/output (I/O) operations per second or in throughput (e.g., GB/s)—indicates how robust a device’s storage subsystem is. With the Flexible I/O (FIO) tool, we can customize I/O workloads to reflect performance relevant to many use cases. For example, a large block sequential read workload mirrors video streaming, while large block write-heavy workloads may simulate importing large amounts of data, and a small block mix of reads and writes might mimic e-commerce applications or ticketing systems. Our team can customize these tests with smaller or larger blocks of data, more or fewer threads, and a number of other factors, depending on your needs. FIO outputs results in input/output operations per second, or IOPS, as well as bytes per second, which we might translate into MB/s or GB/s.
The new HP Z8 Fury G5 Workstation Desktop PC compared to a Lenovo ThinkStation P620 Tower Workstation
AI/ML testing: Image segmentation
Assess AI image segmentation performance for medical imaging and more
AI image segmentation workloads divide an image into sections and classify those sections into different categories. Self-driving cars might use this type of AI to detect a road sign or a pedestrian, while medical research facilities or hospitals might put it to work detecting cancerous tumors in MRI scans, and environmental groups might utilize it for classifying different types of terrain in a satellite image.
There are many ways to approach image segmentation. As just one example, we have experience testing with 3D U-Net—which targets medical imaging—using the MLPerf benchmark suite. This tool produces metrics of latency and how many frames per second the solution could process. We’re also happy to assess the training and/or inference performance of your solution using other image segmentation frameworks or implementations.
For some platforms, we may need to incorporate code development/optimization time in the scope of the project; contact us to learn more.
AI/ML testing: Object detection
Evaluate AI/ML object detection performance
In an AI object detection workload, a system identifies and locates instances of objects from a set of categories in a given dataset (typically, images and videos). Object detection has a wide variety of applications, including threat detection for security and surveillance, self-driving car functions in the transportation industry, and medical imaging, to name a few.
There are several popular object detection models currently in use, including Retinanet, Faster R-CNN (Region-based Convolutional Neural Network), YOLO (You Only Look Once), and SSD (Single Shot Multibox Detector). We’re able to compare training or inference performance on these models using benchmark datasets such as MLPerf’s, COCO (Common Objects in Context), or PASCAL VOC (Visual Object Classes). Each benchmark measures performance differently and offers different outputs.
For some platforms, we may need to incorporate code development/optimization time in the scope of the project; contact us to learn more.
AI/ML testing: LLM testing with PTChatterly
PTChatterly is a new LLM benchmark and sizing framework, exclusively available from PT, that quantifies the performance and user experience your customers can expect for a solution running an in-house LLM.
Utilizing an existing LLM and the retrieval augmented generation (RAG) method, it searches a local corpus of data and constructs responses in AI-assisted chatbot conversations with multiple simulated users. It generates meaningful, real-world metrics, e.g., “32 people can have simultaneous conversations with at worst XX response time.”
For some platforms, we may need to incorporate code development/optimization time in the scope of the project; contact us to learn more.
AI/ML testing: Speech recognition
Gauge a solution’s performance for AI/ML speech recognition
An automatic speech recognition (ASR) AI/ML model does exactly what you’d expect: predict text based on audio input of one or more people speaking.1 ASR applications are useful for captioning videos, transcribing podcasts and meetings, improving accessibility, and much more.
There are as many different ASR models as there are ways to assess their performance. To take just one example, we can test a solution’s training or inference performance on the Recurrent Neural Network Transducer (RNN-T) model using the MLperf RNN-T benchmark. MLperf’s RNN-T benchmark outputs several key metrics, including throughput in number of training sequences processed per second, latency (i.e., the time a solution takes to process a single sequence or inference), Word Error Rate (WER), Character Error Rate (CER), and training time (i.e., the time a solution takes to train the RNN-T model to convergence).
For some platforms, we may need to incorporate code development/optimization time in the scope of the project; contact us to learn more.
I/O testing with Iometer
Understand how a system performs under differing loads
Iometer generates input/output (I/O) operations to stress a solution’s storage capabilities. By configuring block size and read/write ratio, we can use this synthetic workload tool to imitate common workload types such as online transaction processing (OLTP) or data analysis. It yields outputs of input/output operations per second, latency, and throughput. Alternatively, we can use Iometer to drive disk load to stress a solution while also running a different workload or benchmark for additional performance testing.
DISKSPD
Evaluate synthetic storage performance
DISKSPD is a load generator from Microsoft that allows us to create simple synthetic storage workloads containing custom mixes of reads and writes. It is useful for evaluating raw storage performance.1 When we use it in conjunction with a benchmark such as Perfmon or VMFleet, the tool allows us to measure the number of input/output operations per second (IOPS) that a solution can sustain.
Vdbench and HCIBench
Gauge storage performance with flexible synthetic workloads
Vdbench is an open-source, synthetic benchmark tool that generates disk input/output (I/O) workloads to test a solution’s storage performance, while HCIBench—short for hyper-converged infrastructure (HCI) benchmark—draws on Vdbench and the I/O generator Fio to automate testing on HCI solutions. Both tools allow us to customize the read/write ratio of an I/O workload to simulate a specific real-world workload. The benchmarks yield results in input/output operations per second (IOPS), latency, and throughput.
Backup and restore studies (enterprise)
Measure speed and features of backup and recovery solutions
For both backup and recovery, speed is of the utmost importance: A fast solution can minimize downtime, while a slow one can cause significant problems in the event of a storage loss. We can perform backups, simulate losses, and initiate recoveries at scale to test the real-world capabilities of your and/or a competitor’s backup and recovery solution. We work with you to customize what data we’re backing up, the size and type of that data, whether we perform incremental or complete backups, and more.
Deployment studies (enterprise)
Quantify ease of deployment
Deployment is the first step in putting a solution to work. We can deploy a technology ourselves, using the same steps a real-world IT team would, and generate a deployment guide or a test report quantifying the time and effort the deployment required. We can also test your and/or a competitor’s deployment services, acting as a customer and tracking the elapsed time and customer experience from first sales contact to functioning solution.
Manageability studies
Assess time and effort savings from manageability tools
Every piece of technology requires management, so organizations win when IT staff can minimize the time and effort they spend on everyday management tasks. We test manageability tools the way real IT teams use them. In addition to verifying the functionality of key features, we perform common, real-world tasks and measure how much time and how many steps they take, quantifying a tool’s ease of use. In this way, we can compare one tool to another or measure how much time companies can save by using the tool rather than a manual approach.
iPerf
Measure network speed
iPerf is a synthetic, cross-platform network test that measures maximum network bandwidth. It allows us to test with multiple network protocols: transmission control protocol (TCP), stream control transmission protocol (SCTP), and user datagram protocol (UDP). iPerf generates network protocol streams and then measures the bandwidth available, packet rate, packet loss, and more. Its output may include throughput in Gbps and packet rate in packets per second, among other data.
Thin client testing
Assess VDI performance from an end-user perspective
If your end-users log into VDI desktops every day, you may see an opportunity to cut costs by providing them with thin clients rather than full-featured laptops or desktops. The ideal thin client would be inexpensive but easy to manage and deliver a responsive experience in the apps your teams use daily. We sometimes test thin clients in the same ways we test all end-user devices, including with a variety of benchmarks and by hand-timing real-world tasks. We can also assess thin clients from other perspectives more specific to this type of device, including ease of manageability and the number of different VMs a single device can support.
JetStream
Measure web browsing performance
As a JavaScript and WebAssembly benchmark suite, JetStream runs several subtests—including “a variety of advanced workloads and programming techniques”—and combines the results into a single score.1 It gives a higher score to browsers that start up or execute code more quickly, which could translate to smoother user experiences.
Frametest
Measure the rate of video and image transmission
Working with video requires moving large volumes of data from one location to another. Frametest is a synthetic benchmark utility that can evaluate storage performance specifically for video editing use cases. It outputs results in throughput and the number of frames per second (FPS) a solution can sustain, providing insight into such areas as video quality (a solution that handles more frames per second typically displays information more clearly); the experience of video content creators (being able to send data over a network quickly can boost productivity); the efficiency of applications that rely on transferring video, such as security and targeted advertising in retail; and storage performance.
SLOB
Assess Oracle performance
SLOB, or Silly Little Oracle Benchmark, is a tool that generates random read and write I/O operations for Oracle databases, allowing us to measure how many input/output operations per second (IOPS) a solution can handle. The ability to process more IOPS while still supporting high throughput indicates a solution’s ability to support periods of heavy user database activity. SLOB yields results in IOPS, latency, and throughput.
Spark LDA
Measure big data classification performance with Spark
The Spark LDA workload, part of the Intel HiBench suite of benchmarking software, is a topic model that “infers topics from a collection of documents.”1 Latent Dirichlet allocation (LDA) is a real-world technique to dynamically analyze text, identifying, categorizing, and refining topics in a text document as well as summarizing the document. Businesses might use LDA to organize customer reviews on a product or recommend new products based on user history. In using this workload for performance testing, we measure throughput while the workload is running and how long it takes a solution to complete the workload.
Spark LR
Quantify decision-making performance with Logistic Regression
A machine learning algorithm that uses a decision tree to predict a response from a set of categories and sorts data into similar groups, Logistic Regression (LR) can help deliver insights to decisionmakers or recommend products to online shoppers. The Spark LR workload, part of the Intel HiBench suite of benchmarking software, measures how well a system performs LR clustering on categorical data (data that one can categorize into groups), continuous data (data that can have any value), and binary data (data that can only take one of two values).1 When we test with Spark LR, we can measure throughput while the workload runs and how long a solution needs to complete the workload.
Weathervane
Quantify the performance of on-premises and cloud-based Kubernetes clusters
Weathervane is an application-level Kubernetes benchmark that, per VMware, "measures the performance capabilities of a Kubernetes cluster by deploying one or more instances of a benchmark application on the cluster and then driving a load against those applications."1 The benchmark application in question is a multi-tier, real-time auction web app, where simulated users view items and make bids within a set time frame. Weathervane outputs results in WvUsers, representing the greatest number of simulated users the application instances can support while meeting a certain level of performance.
SPECint
Measure compute-intensive performance
Part of the SPEC CPU 2017 benchmark, SPECrate 2017 Integer (also often known by the shorthand name, SPECint) is a suite of tests that stress the processor, memory, and compilers of a solution to provide a general view of compute-intensive performance. It executes a variety of workloads, including artificial intelligence, general data compression, video compression, discrete event simulation, route planning, and more. SPECint generates two higher-is-better throughput scores, one reflecting base performance and the other reflecting peak performance.
SPECfp
Assess floating point performance
Part of the SPEC CPU 2017 benchmark, SPECrate 2017 Floating Point (also often known by the shorthand name, SPECfp) is a suite of tests that stress the processor, memory, and compilers of a solution to provide insight into floating point performance. Its 13 floating point workloads include explosion modeling, fluid dynamics, molecular dynamics, weather forecasting, computational electromagnetics, and regional ocean modeling.1 SPECfp generates two higher-is-better throughput scores, one reflecting base performance and the other reflecting peak performance.
LAMMPS
Measure high performance computing (HPC) capabilities
LAMMPS (an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator) is an open-source molecular dynamics tool for assessing a solution’s performance in this area. The tool includes several benchmark tests, some targeting the CPUs and some targeting a system’s GPU. LAMMPS output varies depending on the benchmark you choose.
Aerospike Database performance on Dell EMC PowerEdge R740xd servers with Intel Ethernet 800 Series adapters and ADQ
Aerospike Database
Assess Aerospike Database performance
Aerospike Database is a NoSQL database platform that supports multiple deployment options on premises or in the cloud. According to Aerospike, the platform "delivers predictable performance, scales from gigabytes to petabytes, is strongly consistent, with unparalleled cross-datacenter replication for a true globally distributed real-time database."1 To assess overall Aerospike Database performance, we use the Aerospike C benchmark, which Aerospike builds into the Aerospike platform. Aerospike C lets users insert records into the database and run tests with a variety of read/write mixes and thread counts.
Optimize creative and design workflows and enjoy a better user experience with the Dell Precision 5680
Video AI
Evaluate system performance using video AI
For creatives that use AI to improve video quality and resolution, Topaz Video AI may be their software of choice. According to Topaz Labs, their solution “focuses solely on completing a few video enhancement tasks really well: deinterlacing, upscaling, and motion interpolation.”1 Topaz Video AI includes a built-in benchmark, which outputs scores that reflect a system’s efficiency, processing time, and frames-per-second rates as it processes sets of of videos at different resolutions.
Random Forest
See how well a solution performs using Random Forest to make predictions
According to HiBench, “Random forests (RF) are ensembles of decision trees. Random forests are one of the most successful machine learning models for classification and regression. They combine many decision trees in order to reduce the risk of overfitting.”1 An organization could use random forests to increase the accuracy of a decision tree; for example, a bank might use RF to make credit risk predictions. Using the HiBench RF workload, we can measure a solution’s throughput while the workload is running and how long it takes a solution to complete the workload.
WordPress
Quantify website performance
WordPress is a popular open-source content management system based on PHP and MySQL database, which we can use to represent a web server workload. To see how well a given solution performs when running WordPress, we use benchmarks to simulate a large number of users accessing a website. These include the open-source website transaction benchmark suite oss-performance and the Apache JMeter load-testing tool. Regardless of benchmark, we typically look at WordPress performance in terms of web server transactions per second, with a higher number indicating better performance.
Reduce software licensing and other costs by choosing latest-generation 16G Dell PowerEdge servers powered by 4th Gen AMD EPYC processors
Mixed cloud workloads on servers with Intel Xeon Platinum processors vs. AMD processor-powered solution
TeraSort
Measure Hadoop performance
The TeraSort workload, part of the Intel HiBench suite of benchmarking software, determines how quickly Hadoop clusters can sort a dataset. It combines network, I/O, and compute tasks to provide insight into how a cluster might handle general-purpose jobs on Hadoop. TeraSort generates a lower-is-better score that represents the time to complete a workload and a higher-is-better throughput score.
VMFleet
Assess storage performance
VMFleet is a set of scripts that Microsoft developed to test a solution’s storage performance. With VMFleet, the user deploys multiple VMs, each running a storage load generator, DISKSPD. According to Microsoft, "DISKSPD is a tool that you can customize to create your own synthetic workloads, and test your application before deployment."1
WordCount
Evaluate how well a system performs MapReduce analysis
Part of the Intel HiBench suite of benchmarking software, WordCount tallies the occurrence of each word in a randomized dataset. This CPU-intensive workload makes use of MapReduce, a common Apache Hadoop framework organizations use to access big data. HiBench says that it represents real-world MapReduce tasks, as it extracts “a small amount of interesting data from [a] large data set.”1With WordCount, we can quantify both the throughput a solution analyzes and how long it takes the solution to complete the workload.
Benchcraft
Measure OLTP performance
Microsoft Benchcraft is an online transaction processing (OLTP) database benchmark similar to TPC-E, which simulates a stockbroker performing stock trades. Benchcraft generates a transactions-per-second metric.
Benchcraft is not a full implementation of official TPC-E standards, so any Benchcraft results are not directly comparable to published TPC-E results.
Intel Edge Insights for Industrial
Measure Intel EII performance
Intel Edge Insights for Industrial (EII) is a software package that captures video and time series data from factory environments and uses AI analysis to produce near-real-time intelligence that companies can act on to improve operations. For companies that are interested in implementing Intel EII, we can test (1) video streaming ingest and inference performance and (2) time series ingest and store performance on various hardware solutions.
Data reduction testing
Determine data compression and deduplication ratios
Data center sprawl can be very costly, so storage solutions that minimize the amount of space your data occupies are increasingly valuable. In our data reduction testing, we measure before-and-after storage usage to evaluate the compression and deduplication capabilities of different storage solutions.
Decision support workload
Measure how well a solution can handle a decision support system
To test a solution’s capabilities in this area, we often use a workload derived from the TPC-DS benchmark, which simulates a decision support system. Decision support applications analyze large quantities of information and utilize that data to offer possible next steps, making them valuable to organizations across a broad variety of industries.
This workload outputs results in query response time: how long it takes to complete a set of queries of a certain size. (The workload is not a full implementation of official TPC-DS standards, so any results are not directly comparable to published TPC-DS results.)
Optimize creative and design workflows and enjoy a better user experience with the Dell Precision 5680
Video encoding
Quantify intensive video encoding performance
Video files can be extremely large. Compressing a video file so it’s compatible with web and mobile players—while still maintaining picture quality—is called encoding.1 This important work is also one of the most resource-intensive tasks for a system,2 stressing memory, CPU, and GPU. Using a benchmark from HandBrake, a popular converting and encoding platform, we can measure both how long it takes a system to encode a video and the rate of frames per second (FPS) a system handles.
Power efficiency testing
Determine performance-per-watt metrics
The server selection process might initially focus on reliability and performance, but energy consumption is an increasingly important consideration. A server that delivers more work for each watt of electricity it consumes can translate to lower data center power and cooling expenses. In our power efficiency testing, we measure a solution’s energy usage and combine it with performance data to determine performance-per-watt metrics that help buyers make informed decisions.
VM and container density testing
Determine how many virtual machines or containers a solution can support
By selecting solutions that achieve greater VM or container density, companies maximize the value of virtualization and can save by being able to perform a given amount of work with a smaller investment. In our density testing, we typically select a workload, such as an OLTP database, and measure the performance of a single VM or container running that workload; we then have a baseline. We next add virtual machines or containers until the average per-VM/per-container performance falls well below the baseline. For example, if performance dropped at eight VMs, we would say the solution supported seven virtual machines at this level of performance.
PCMark
Demonstrate device performance in the workplace
PCMark is a benchmark set that, per UL, “features a comprehensive set of tests that cover the wide variety of tasks performed in the modern workplace.”1 The main tool (PCMark 10 benchmark) evaluates PCs based on real-world tasks, such as web browsing, video calls, working with spreadsheets, photo editing, and more. Based on a system’s responsiveness while completing the tasks, PCMark delivers scores for each task and category of tasks, as well as an overall score—the higher the score, the smoother the experience end users can expect.
Along with benchmarks for applications and storage, PCMark also features a battery life test that measures how long a worker could expect their device to run on battery
Cinebench
Evaluate PC hardware performance
According to Maxon, “Cinebench offers a real-world benchmark that incorporates a user’s common tasks within Cinema 4D to measure a system’s performance.”1 Outputting scores for both single-core and multi-core CPU performance, Cinebench measures how a device runs under a high CPU load, allows you to gauge how well the cooling system works during longer tasks, and tests how the device works with intensive 3D tasks.2 Higher scores could indicate faster PC response times on graphics-intensive games, product development and design software, and scientific simulations.
Benchmark performance and battery life on a Dell Latitude 5320 Business Laptop with Intel Core i5-1145G7 vPro processor
MobileMark
Put battery life and device performance to the test
Designed to run on a PC device while it’s unplugged, MobileMark simulates a worker performing productivity and creativity tasks on applications such as Microsoft 365 apps, Google Chrome, and Adobe Creative Cloud. The benchmark reports battery life in hours and minutes while performing these tasks, a battery performance score, and an index score that shows the balance between battery life and performance.1
Migration studies
Assess the ease of migration between solutions
As organizations periodically update and expand their technology infrastructures, the easier it is to migrate from an existing older solution to a new one, the better. We perform migrations the way they happen in the real world, measuring the time and effort they take between different solutions or using different migration or management tools. We can do this for both small-scale laptop changeovers and large-scale, complex upgrades from, for example, one storage array or server cluster to another.
Battery life and system performance on a Dell Latitude 5420 laptop with Intel Core i5-1145G7 vPro processor
CrossMark
Evaluate the user experience
How quickly a device completes workloads (performance) and responds to a user’s input (responsiveness) both affect user experience. The CrossMark benchmark uses models of real-world applications to measure performance and responsiveness and assigns an overall score it bases on both. According to BAPCo, developers of CrossMark, it is a “cross-platform benchmark that enables direct comparison of desktops and touch enabled devices across Windows, iOS or macOS, and Android.”1
User experience from multiple angles on the HP EliteBook 840 G9 with Intel Core i7 U Series or P Series processors
Productivity at multiple price points with Dell Latitude 5000 series laptops featuring 12th Gen Intel Core processors
PugetBench
Measure Adobe Creative Cloud capabilities
The Adobe Creative Cloud suite comprises applications that place different kinds of frequently heavy demands on compute resources. It can be challenging for creative professionals and those who buy gear for them to understand how well a given workstation can support these applications. Puget Systems has developed the PugetBench suite of benchmarks “to thoroughly test many of Adobe’s most popular applications using real-world projects and workflows.”1
The suite includes PugetBench for Photoshop, PugetBench for Lightroom Classic, PugetBench for Premiere Pro, and PugetBench for After Effects.
Business and creative performance of Dell Latitude 5000 and 7000 series laptops with 12th Gen Intel Core processors
Productivity at multiple price points with Dell Latitude 5000 series laptops featuring 12th Gen Intel Core processors
WebXPRT
Quantify browser performance
WebXPRT 4 is an industry-standard browser benchmark that compares the performance of web-enabled devices when executing real-world tasks. It contains HTML5, JavaScript, and WebAssembly-based scenarios that mirror activities users perform: Photo Enhancement, Organize Album Using AI, Stock Option Pricing, Encrypt Notes and OCR Scan using WASM, Sales Graphs, and Online Homework.
Principled Technologies is the publisher of the XPRT family of benchmarks and the administrator of the BenchmarkXPRT Development Community.
Procyon benchmark suite
Gauge device performance for professional users
The Procyon benchmark suite from UL comprises five benchmarks that target “professional users in industry, enterprise, government, retail and press.”1 The Office Productivity Benchmark can measure Windows or macOS device performance for office productivity work using Microsoft Office apps. Also available for Windows or macOS, the Photo Editing Benchmark uses Adobe Lightroom to import, process, and modify a selection of images, then applies multiple edits and layer effects to a photograph in Adobe Photoshop. The AI Inference Benchmark uses neural network models such as MobileNet V3, Inception V4, SSDLite V3, and DeepLab V4. The Battery Life Benchmark measures the battery life of Windows laptops, notebooks, and tablets across different scenarios. And lastly, the Video Editing Benchmark measures the time it takes a PC to apply edits, adjustments, and effects and then export video project files to common formats in Adobe Premiere Pro. These latter three benchmarks are currently Windows-specific, but UL says macOS support is on the way.
SYSmark
Measure system performance
According to BAPCo, developers of SYSmark, the benchmark measures system performance using “real-world applications and workloads.”1 It generates an overall rating based on a combination of results that measure system performance while running office and media tasks and scenarios that model common pain points, such as launching files and multitasking. We can run the latest version, SYSmark 30, or the second-to-latest version, SYSmark 25, depending on your preferences.
Benchmark scores, task-based performance, and feature comparison of the Microsoft Surface Book 3 vs. a MacBook Pro
Geekbench
Measure the single- and multi-core power of a system’s processor
Geekbench is a cross-platform utility that aims to measure the single-core and multi-core power of the central processing unit of a computer. Primate Labs, developer of Geekbench, says the latest version, Geekbench 6, “measures performance in new application areas including Augmented Reality and Machine Learning.”1
Portability, performance, and collaboration on the Lenovo ThinkPad X1 Extreme Gen 4 powered by Intel
Maxon Redshift
Measure GPU-accelerated rendering performance
According to Maxon, Redshift is “a powerful GPU-accelerated renderer, built to meet the specific demands of contemporary high-end production rendering. Tailored to support creative individuals and studios of every size, Redshift offers a suite of powerful features and integrates with industry standard CG applications.”1 We test a system’s performance by measuring how long the system takes to complete a RedShift render.
Blender
Assess CPU and GPU performance
The Blender benchmark measures CPU and GPU performance by determining the number of samples per minute a system can handle. Blender Open Data is a platform the Blender community provides to collect, display, and query the results of hardware and software performance tests.1
Onshape
Measure CAD capabilities
Onshape is a computer-aided design software system, available using a software-as-a-service model. The Onshape online performance check provides the browser and GPU with an increasing workload and reports the device limits when the lines and triangles per second framerates start to drop.1 The higher the framerate a device can support, the better the CAD performance users can expect.
Speedtest
Speedtest by Ookla tests the speed of devices’ connections to the internet. It outputs results in megabits per second (Mbps) for both downloads and uploads. To ensure that any differences in connection speed are due to the devices rather than the internet service provider or other external networking elements, when using this test for multiple devices, we run tests on all devices at the same time and in the same networking environment.
PassMark
Assess multiple elements of PC performance
PassMark offers a number of benchmarks for performance testing of Windows and Linux systems. Among others, these include BurnInTest, which quickly stress-tests PCs; PerformanceTest, which combines 28 speed tests for easy benchmarking of CPU, 2D and 3D graphics, disk, and memory; and MemTest86, which stress-tests a system’s memory.1 Output depends on which PassMark benchmark you choose.
HDXPRT
Test how Windows 10 PCs handle media and more
HDXPRT 4 is a benchmark that assesses the performance of Windows 10 devices by measuring how well they handle real-world media tasks. HDXPRT 4 uses real commercial applications, such as Photoshop and MediaEspresso, to carry out tasks based on photo editing, video conversion, and music editing.
Principled Technologies is the publisher of the XPRT family of benchmarks and the administrator of the BenchmarkXPRT Development Community.
MobileXPRT
Assess Android device performance
MobileXPRT 3 is a benchmark for evaluating the capabilities of Android devices. It runs six performance scenarios: Apply Photo Effects, Create Photo Collages, Create Slideshow, Encrypt Personal Content, Detect Faces to Organize Photos, and Scan Receipts for Spreadsheet. In addition to scoring each scenario, it generates a single overall performance score.
Principled Technologies is the publisher of the XPRT family of benchmarks and the administrator of the BenchmarkXPRT Development Community.
CrXPRT
Measure Chromebook performance and battery life
The CrXPRT 2 benchmark evaluates the performance and battery life of Chromebooks. The CrXPRT performance test measures how quickly a Chromebook handles everyday tasks such as playing video games, watching movies, editing pictures, and doing homework. It generates an overall score and individual scores for each workload. The CrXPRT battery life test produces an estimated battery life, a separate performance score, and a frames-per-second (FPS) rate for a built-in HTML5 gaming component.
Principled Technologies is the publisher of the XPRT family of benchmarks and the administrator of the BenchmarkXPRT Development Community.
Benchmark performance and responsiveness on the Fujitsu STYLISTIC R726 compared to the Dell Latitude 7275
TouchXPRT
See how Windows 10 devices stack up
TouchXPRT is a benchmark for evaluating the performance of Windows 10 devices. It runs tests based on five common scenarios: Beautify Photos, Blend Photos, Convert Videos for Sharing, Create Music Podcast, and Create Slideshow from Photos. TouchXPRT produces results for each of the five test scenarios plus an overall score.
Principled Technologies is the publisher of the XPRT family of benchmarks and the administrator of the BenchmarkXPRT Development Community.
System responsiveness and benchmark scores for Dell OptiPlex small and micro form-factor desktops powered by Intel vs. comparable HP desktops
AIXPRT
Evaluate your system’s machine learning inference performance
AIXPRT is an AI benchmark tool that makes it easier to evaluate a system's machine learning inference performance by running common image-classification, object-detection, and recommender system workloads. AIXPRT includes support for the Intel OpenVINO, TensorFlow, and NVIDIA TensorRT toolkits to run image-classification and object-detection workloads with the ResNet-50 and SSD-MobileNet v1 networks, as well as a Wide and Deep recommender system workload with the Apache MXNet toolkit. The test reports FP32, FP16, and INT8 levels of precision. Test systems must be running Ubuntu 18.04 LTS or Windows 10, and the minimum CPU and GPU requirements vary by toolkit.
Principled Technologies is the publisher of the XPRT family of benchmarks and the administrator of the BenchmarkXPRT Development Community.
Task-based time testing
See how systems perform real-world tasks
A user’s real-world experience of their system depends a great deal on how long it takes to accomplish the tasks they do every day. We select tasks representative of real-world workflows—such as booting a laptop, opening a document, or exporting a photo—and use hand timers to measure how many seconds a system requires to complete them. We can perform this testing with any task or series of tasks, making it a flexible, realistic approach to highlighting how a system could boost (or harm) productivity.
Thermal testing (end-user)
Discover surface temperatures while idle or under stress
Especially for hybrid or remote workforces who may use devices on their laps, a notebook’s surface temperature can be an important consideration. Excessive heat can be a matter of discomfort or, in extreme cases, could spell hardware failure. By measuring devices’ surface temperatures, we can demonstrate which might offer a more comfortable user experience or be less susceptible to overheating. We can perform this testing while a machine is at rest or under load and supplement the data with thermal camera images.
Thermal testing (enterprise)
See which systems run cooler under load
For organizations with large data centers, power and cooling can be a significant expense. In addition to assessing the quality of power management software tools and a system’s energy efficiency (see “Manageability testing studies” and “Power efficiency testing”), we can measure how well a device’s cooling mechanisms are functioning by assessing how much heat the device is outputting and whether that heat is having an effect on the device’s performance. Depending on the exact tests we run, we may be able to use that data to determine how much money an organization might save on power and cooling by choosing that device.
Task-based and benchmark testing for the classroom with Intel Celeron N3450 processor-powered Chromebooks
3DMark
Evaluate graphics and compute performance with demanding gaming workloads
According to UL Solutions, maker of 3DMark, “3DMark is used by millions of gamers, hundreds of hardware review sites, and many of the world's leading manufacturers.”1 3DMark offers a suite of benchmarks, each of which runs intensive gaming workloads that put a system’s graphics and compute capabilities to the test. While it outputs useful results for gamers, 3DMark can also help indicate system performance while running compute-intensive workloads that deal with 3D objects.
Acoustic testing
Assess how much noise a device emits
In office environments, background noise can disrupt users’ focus and even their memory.1 While noise-canceling headphones or closed doors can help, these measures could also leave users feeling isolated. Instead, to cut down on distracting noises, businesses might consider quieter devices for their employees. We test devices’ noise outputs while those devices are idle or running an intensive workload. As another useful point of comparison, we can present these results in conjunction with real-world sound levels. (For more on our audio testing capabilities, see “Microphone quality testing” and “Speaker quality testing”.)
Feature comparison
Compare features on multiple devices
Performance and battery life are critical concerns for buyers of end-user devices, but they’re not the only elements customers consider. We compare devices’ features in a host of areas, including size, weight, screen resolution, number and type of ports, camera and microphone specs, touchscreen and pen-and-ink options, convertibility, and much more. For any feature your device boasts, we can highlight its utility and functionality against the equivalent feature(s) from the competition.
Application verification
Verify apps are working as they should
If a user’s everyday work and/or their hobbies rely on specialty applications—such as CAD apps, engineering applications, or compute-intensive games—they need to know that those applications will run smoothly on any device they might purchase. We can install, open, and complete tasks in any application on any system to verify that the system can handle the application(s) and will function as a user would expect.
Speedometer
See how quickly web-based applications respond to user input
Speedometer is a browser-based responsiveness test for web applications. The benchmark simulates user actions in a demo app, then measures the time it takes to complete those tasks. A higher Speedometer score indicates faster responsiveness, which could translate to a smoother user experience.
Battery life testing
Test how long a device’s battery lasts under varying conditions
Battery life is a top concern for many buyers of end-user devices, whether the device is for business or personal use. We can assess a device’s battery life using a benchmark, such as PCMark or MobileMark, or by continuously running a real-world workload, such as local or streaming video playback, and measuring how long the device’s battery lasts before dying. We can also use this data to assess the efficiency of the battery in minutes per WHr.
Bayesian classification (HiBench)
Measure ML data classification performance
The Bayesian classification workload from the HiBench suite is a machine learning (ML) workload that measures a solution’s ability to mine data to make probabilistic predictions. The training portion of Naïve Bayesian classification is a popular algorithm for knowledge discovery and data mining, with Bayesian classifiers performing important tasks such as identifying and filtering spam emails. In using this workload for performance testing, we measure throughput while the workload is running and how long it takes a solution to complete the workload.
Unigine benchmarks
Measure compute-intensive performance for gaming
The Unigine performance benchmarks “generate true in-game rendering workloads across multiple platforms” and are useful for assessing “the stability of PC hardware (CPU, GPU, power supply, cooling system) under extremely stressful conditions, as well as for overclocking.”1 The three benchmarks—Superposition, Valley, and Heaven—incorporate interactivity and large maps of detailed 3D terrain, making them suitable for assessing a device’s potential gaming and/or 3D rendering performance.
In-game benchmarks
Assess devices’ real-world gaming performance
If one of a PC user’s primary concerns is gaming performance, what better way to assess the device than using a benchmark built into a game? Several PC games, including Sid Meier’s Civilization IV and Deus X: Mankind Divided, include built-in benchmarks that measure metrics such as AI turn speed and frame rate. These benchmarks can also be valuable for an audience that does not prioritize gaming, allowing us to assess how a device performs under a very graphics-intensive workload.
Basemark GPU
Evaluate a device’s graphics performance
Basemark GPU, which Basemark calls “the ultimate graphics performance benchmark,” evaluates graphics performance on Windows, macOS, Ubuntu, Linux, Android, and iOS devices.1 It does this by running through a graphically intensive scene and generating a score that users can compare to other Basemark scores. It offers two modes: High Quality, for desktop systems, and Medium Quality, for mobile systems.
Durability testing and measuring the impact of temperature on performance for the Dell Latitude 7220 Extreme Tablet
Durability testing (end-user)
Discover how a device holds up under wear and tear
Many users primarily rely on their devices at home and in the office, packing them carefully into backpacks for the transition. But others need a device that can stand up to wear and tear—environments as varied as hospitals, construction sites, and classrooms all require tech that delivers both high performance and strong durability. We can test a laptop’s or tablet’s toughness from multiple perspectives, including how it handles drops, spills, scratches, heavy rain, bright sunlight, extreme temperatures, and more.
Durability testing (enterprise)
Test rugged capabilities for enterprise hardware at the edge
Organizations in industries from telecom to industrial engineering increasingly need data center-level computing in environments without data center-level control. We can assess how a rugged server or edge device performs in locations that are too hot, cold, damp, dusty, or high-vibration for standard enterprise hardware. We can also measure how that device might handle a specific disaster, such as a flood, a freeze, or an earthquake.
Thermal testing (enterprise)
See which systems run cooler under load
For organizations with large data centers, power and cooling can be a significant expense. In addition to assessing the quality of power management software tools and a system’s energy efficiency (see “Manageability studies” and “Power efficiency testing”), we can measure how well a device’s cooling mechanisms are functioning by assessing how much heat the device is outputting and whether that heat is having an effect on the device’s performance. Depending on the exact tests we run, we may be able to use that data to determine how much money an organization might save on power and cooling by choosing that device.
Deployment studies (on end-user devices)
Quantify and qualify the benefits of automated end-user device deployment
Deploying desktops and laptops to hundreds or thousands of employees has never been a straightforward task for IT, and today’s increasingly remote or hybrid environments have only added complexity. We can test your deployment tools and services the way real IT teams would use them, assessing how much time and effort they save compared to a manual deployment process or competing solution. Our reports incorporate in-depth discussions of our experience using your solution, focusing on benefits to both IT and end-users.
K-means
Assess machine learning capabilities to cluster data
Using workloads from the HiBench or Spark-Bench benchmark suites, we measure how well a system completes k-means clustering. As a machine learning algorithm, k-means clustering sorts data into similar groups, helping businesses analyze data to map new storefront locations or target marketing to different demographics. The sooner a solution completes this compute-intensive work, the sooner you can use the resulting data for business-critical endeavors. With this test, in which we have the option to customize size of the dataset, we report how long it takes a solution to complete the workload and what throughput the solution delivers while doing so.
SPECviewperf
Quantify graphics performance with workloads based on professional applications
For systems running under OpenGL and DirectX application programming interfaces, SPECviewperf is “the worldwide standard for measuring graphics performance based on professional applications,” according to the SPEC organization.1 Running rendering workloads based on applications such as 3ds Max, Maya, and Solidworks, SPECviewperf outputs a score where higher numbers indicate better performance.
SPECworkstation
Assess workstation performance with a variety of workloads
The SPECworkstation benchmark suite tests a device’s processor, graphics card, I/O, and memory bandwidth performance across a series of 30 workloads containing almost 140 total tests. In addition to general tasks, workload categories include tests specific to industries such as product development, financial services, energy (oil and gas), and more. The SPEC organization modeled tests on real-world applications relevant to key sectors, such as the Blender and Maya apps for the media and entertainment industries.1 The higher each workload score, the faster you might expect a device to perform for employees doing that type of work.
Backup and restore studies (end-user)
Measure the efficacy of any end-user backup and recovery solution
Losing laptop data can be a logistical and personal nightmare—not to mention a major impact to productivity at work. We can test the efficacy, ease of use, and features of your backup and recovery services and your competitors’, proving time savings and exploring your solution’s benefits to users and organizations alike.
Microphone quality testing
In an environment where many users work remotely or in open office environments, the quality of a system’s onboard microphone may become especially important. We can assess microphones by testing their off-axis rejection and/or noise reduction capabilities, both measurements that quantify how much ambient noise makes it through the microphone while a speaker is talking. For users who work in busy offices or loud environments, these metrics may be particularly useful: When they need to hop on a video call, the person on the other end should be able to hear their voice with clear, focused sound and minimal distracting background noises. A mic with better off-axis rejection may also reduce the need for a separate headset device.
Speaker loudness & quality testing
While some folks use headphones or external speakers, many people prefer to utilize their device’s built-in speakers. In a world where many of us are video-conferencing every day, speaker quality is important. We can measure how loud a speaker can get, with louder speakers being better for the user. (You can always turn a speaker down, but you can’t turn it up past its maximum point!) We can also pair this test with a jury study on speaker quality, asking a set of sample users what they like and dislike about the sound of the speakers. Together, the objective and subjective approaches paint a holistic picture of the experience of using a system’s speakers.