PT-Logo
Forgot your password?
BenchmarkXPRT Blog banner

Category: Servers

More details about CloudXPRT’s workloads

About a month ago, we posted an update on the CloudXPRT development process. Today, we want to provide more details about the three workloads we plan to offer in the initial preview build:

  • In the web-tier microservices workload, a simulated user logs in to a web application that does three things: provides a selection of stock options, performs Monte-Carlo simulations with those stocks, and presents the user with options that may be of interest. The workload reports performance in transactions per second, which testers can use to directly compare IaaS stacks and to evaluate whether any given stack is capable of meeting service-level agreement (SLA) thresholds.
  • The machine learning (ML) training workload calculates XGBoost model training time. XGBoost is a gradient-boosting framework  that data scientists often use for ML-based regression and classification problems. The purpose of the workload in the context of CloudXPRT is to evaluate how well an IaaS stack enables XGBoost to speed and optimize model training. The workload reports latency and throughput rates. As with the web-tier microservices workload, testers can use this workload’s metrics to compare IaaS stack performance and to evaluate whether any given stack is capable of meeting SLA thresholds.
  • The AI-themed container scaling workload starts up a container and uses a version of the AIXPRT harness to launch Wide and Deep recommender system inference tasks in the container. Each container represents a fixed amount of work, and as the number of Wide and Deep jobs increases, CloudXPRT launches more containers in parallel to handle the load. The workload reports both the startup time for the containers and the Wide and Deep throughput results. Testers can use this workload to compare container startup time between IaaS stacks; optimize the balance between resource allocation, capacity, and throughput on a given stack; and confirm whether a given stack is suitable for specific SLAs.

We’re continuing to move forward with CloudXPRT development and testing and hope to add more workloads in subsequent builds. Like most organizations, we’ve adjusted our work patterns to adapt to the COVID-19 situation. While this has slowed our progress a bit, we still hope to release the CloudXPRT preview build in April. If anything changes, we’ll let folks know as soon as possible here in the blog.

If you have any thoughts or comments about CloudXPRT workloads, please feel free to contact us.

Justin

CloudXPRT development news

Last month, Bill announced that we were starting work on a new data center benchmark. CloudXPRT will measure the performance of modern, cloud-first applications deployed on infrastructure as a service (IaaS) platformson-premises platforms, externally hosted platforms, and hybrid clouds that use a mix of the two. Our ultimate goal is for CloudXPRT to use cloud-native components on an actual stack to produce end-to-end performance metrics that can help users determine the right IaaS configuration for their business.

Today, we want to provide a quick update on CloudXPRT development and testing.

  • Installation. We’ve completely automated the CloudXPRT installation process, which leverages Kubernetes or Ansible tools depending on the target platform. The installation processes differ slightly for each platform, but testing is the same.
  • Workloads. We’re currently testing potential workloads that focus on three areas: web microservices, data analytics, and container scaling. We might not include all of these workloads in the first release, but we’ll keep the community informed and share more details about each workload as the picture becomes clearer. We are designing the workloads so that testers can use them to directly compare IaaS stacks and evaluate whether any given stack can meet service level agreement (SLA) thresholds.
  • Platforms. We want CloudXPRT to eventually support testing on a variety of popular externally hosted platforms. However, constructing a cross-platform benchmark is complicated and we haven’t yet decided which external platforms the first CloudXPRT release will support. We’ve successfully tested the current build with on-premises IaaS stacks and with one externally hosted platform, Amazon Web Services. Next, we will test the build on Google Cloud Hosting and Microsoft Azure.
  • Timeline. We are on track to meet our target of releasing a CloudXPRT preview build in late March and the first official build about two months later. If anything changes, we’ll post an updated timeline here in the blog.

If you would like to share any thoughts or comments related to CloudXPRT or cloud benchmarking, please feel free to contact us.

Justin

Understanding AIXPRT’s default number of requests

A few weeks ago, we discussed how AIXPRT testers can adjust the key variables of batch size, levels of precision, and number of concurrent instances by editing the JSON test configuration file in the AIXPRT/Config directory. In addition to those key variables, there is another variable in the config file called “total_requests” that has a different default setting depending on the AIXPRT test package you choose. This setting can significantly affect a test run, so it’s important for testers to know how it works.

The total_requests variable specifies how many inference requests AIXPRT will send to a network (e.g., ResNet-50) during one test iteration at a given batch size (e.g., Batch 1, 2, 4, etc.). This simulates the inference demand that the end users place on the system. Because we designed AIXPRT to run on different types of hardware, it makes sense to set the default number of requests for each test package to suit the most likely hardware environment for that package.

For example, testing with OpenVINO on Windows aligns more closely with a consumer (i.e., desktop or laptop) scenario than testing with OpenVINO on Ubuntu, which is more typical of server/datacenter testing. Desktop testers require a much lower inference demand than server testers, so the default total_requests settings for the two packages reflect that. The default for the OpenVINO/Windows package is 500, while the default for the OpenVINO/Ubuntu package is 5,000.

Also, setting the number of requests so low that a system finishes each workload in less than 1 second can produce high run-to-run variation, so our default settings represent a lower boundary that will work well for common test scenarios.

Below, we provide the current default total_requests setting for each AIXPRT test package:

  • MXNet: 1,000
  • OpenVINO Ubuntu: 5,000
  • OpenVINO Windows: 500
  • TensorFlow Ubuntu: 100
  • TensorFlow Windows: 10
  • TensorRT Ubuntu: 5,000
  • TensorRT Windows: 500


Testers can adjust these variables in the config file according to their own needs. Finding the optimal combination of machine learning variables for each scenario is often a matter of trial and error, and the default settings represent what we think is a reasonable starting point for each test package.

To adjust the total_requests setting, start by locating and opening the JSON test configuration file in the AIXPRT/Config directory. Below, we show a section of the default config file (CPU_INT8.json) for the OpenVINO-Windows test package (AIXPRT_1.0_OpenVINO_Windows.zip). For each batch size, the total_requests setting appears at the bottom of the list of configurable variables. In this case, the default setting Is 500. Change the total_requests numerical value for each batch size in the config file, save your changes, and close the file.

Total requests snip

Note that if you are running multiple concurrent instances, OpenVINO and TensorRT automatically distribute the number of requests among the instances. MXNet and TensorFlow users must manually allocate the instances in the config file. You can find an example of how to structure manual allocation here. We hope to make this process automatic for all toolkits in a future update.

We hope this information helps you understand the total_requests setting, and why the default values differ from one test package to another. If you have any questions or comments about this or other aspects of AIXPRT, please let us know.

Justin

XPRTs in the datacenter

The XPRTs have been very successful on desktops, notebooks, tablets, and phones. People have run WebXPRT over 295,000 times. It and other benchmarks such as MobileXPRT, HDXPRT, and CrXPRT are important tools globally for evaluating device performance on various consumer and business client platforms.

We’ve begun branching out with tests for edge devices with AIXPRT, our new artificial intelligence benchmark. While typical consumers won’t be able to run AIXPRT on their devices initially, we feel that it is important for the XPRTs to play an active role in a critical emerging market. (We’ll have some updates on the AIXPRT front in the next few weeks.)

Recently, both community members and others have asked about the possibility of the XPRTs moving into the datacenter. Folks face challenges in evaluating the performance and suitability to task of such datacenter mainstays as servers, storage, networking infrastructure, clusters, and converged solutions. These challenges include the lack of easy-to-run benchmarks, the complexity and cost of the equipment (multi-tier servers, large amounts of storage, and fast networks) necessary to run tests, and confusion about best testing practices.

PT has a lot of expertise in measuring datacenter performance, as you can tell from the hundreds of datacenter-focused test reports on our website. We see great potential in our working with the BenchmarkXPRT Development Community to help in this area. It is very possible that, as with AIXPRT, our approach to datacenter benchmarks would differ from the approach we’ve taken with previous benchmarks. While we have ideas for useful benchmarks we might develop down the road, more immediate steps could be drafting white papers, developing testing guidelines, or working with vendors to set up a lab.

Right now, we’re trying to gauge the level of interest in having such tools and in helping us carry out these initiatives. What are the biggest challenges you face in datacenter-focused performance and suitability to task evaluations? Would you be willing to work with us in this area? We’d love to hear from you and will be reaching out to members of the community over the coming weeks.

As always, thanks for your help!

Bill

Check out the other XPRTs: