BenchmarkXPRT Blog banner

Category: computer vision

WebXPRT 5: Starting to assemble the pieces

In our last blog post, we shared the exciting news that we’re currently working on WebXPRT 5. In that post, we described some of the ways that WebXPRT may evolve with the release of WebXPRT 5. In today’s post, we’ll revisit some of the points of emphasis from the last post and focus on potential workload changes in a bit more detail.

With any benchmark development project, there are always technical challenges you need to iron out. That is especially true with a cross-platform, browser-based benchmark like WebXPRT. Because we’re in the middle of exploring the technical feasibility of a few of the options we’ll mention, we’re not yet ready to say for certain that all these features will be available in the initial WebXPRT 5 release. We can, however, now paint a clearer picture of the overall direction we’re headed.

In the section below, you’ll find updated info on where we stand with respect to some of the key development focal points we discussed in our last post. If there’s an item from that post or previous posts that we didn’t mention below—such as updating the test harness—it doesn’t mean that we’re dropping that goal. We’re just focusing on workloads today.

One of our key goals with WebXPRT 5 is providing more AI-related workloads. In past blog posts, we’ve discussed the growing importance of local, browser-side AI. With WebXPRT 5, we’re investigating two ways that we can expand WebXPRT’s AI portfolio: 1) updating existing WebXPRT 4 AI-oriented workloads, and 2) adding all-new AI workloads.

Here are some possible ways those AI-related changes may play out in both categories:

Updating existing WebXPRT 4 AI-oriented workloads

  • Splitting the existing Organize Album using AI workload’s timed tasks—face detection and image classification—into two independent workloads.
  • Updating the face detection and image classification tasks with the latest versions of the OpenCV.js computer vision and machine learning libraries.
  • Updating the Caffe deep learning framework for the face detection task.
  • Updating the ONNX-based SqueezeNet machine learning model for the image classification tasks.
  • Updating the version of the Tesseract.js OCR engine that WebXPRT uses in the Encrypt Notes and OCR Scan workload. 

Potentially adding all-new AI workloads (either core or experimental workloads)

  • We’re exploring the idea of including a workload that uses an AI-powered segmentation model to blur the background of a video call.
  • We’re exploring the feasibility of including a local LLM chat workload.
  • We would eventually like to include a WebGPU-based web AI framework for a computer vision workload.

In addition to the goal of adding more AI, we previously discussed the possibility of adding non-AI WebGPU workloads. As a web API, WebGPU enables web-based applications—such as image-based GenAI and inference workloads—to directly access the graphics rendering and computational capabilities of a system’s GPU. In the future, WebXPRT 5 could use that technology to execute complex 3D rendering workloads.

We hope today’s post gives you a better sense of where WebXPRT 5 may be headed. We want to reemphasize that while we are actively investigating the possible changes mentioned above, nothing is set in stone. As the pieces start to fall into place, we’ll provide more information here in the blog.

If you have any questions or comments about WebXPRT 5, please feel free to contact us!

Justin

Browser-based AI tests in WebXPRT 4: optical character recognition

In our previous blog post, we discussed the rapidly expanding influence of AI-enhanced technologies in areas like everyday browser activity—and the growing need for objective performance data that can help us understand how well new consumer devices will handle AI tasks. We noted that WebXPRT 4 already includes timed AI tasks in two of its workloads—the “Organize Album using AI” and “Encrypt Notes and OCR Scan”—and we provided some technical details for the Organize Album workload. In today’s post, we’ll focus on the Encrypt Notes workload.

The Encrypt Notes workload includes two separate scenarios that reflect common web-based productivity app tasks. The first scenario syncs a set of encrypted notes, and the second scenario uses AI-based optical character recognition (OCR) to scan a receipt, extract data, and then add that data to an expense report.

Here are the details for each scenario:

  • The encrypt notes scenario downloads a set of notes, encrypts that data, temporarily stores it in the browser’s localStorage object (the localStorageDB.js database layer), and then decrypts and renders it for display. This scenario measures HTML5 Local Storage, JavaScript, AES encryption, and WebAssembly (Wasm) performance. 
  • The OCR scan scenario uses a Wasm-based version of Tesseract.js (tesseract-core.wasm.js v2.20) to scan an expense receipt. Tesseract.js is a JavaScript port of the Tesseract OCR engine—a popular open-source C/C++ library that extracts text from images and PDFs. The scenario then adds the receipt to an expense report. This scenario measures HTML5 Local Storage, JavaScript, and Wasm performance. 

We mention this test under the AI umbrella in part because people sometimes use the term “OCR” to refer to a spectrum of AI and non-AI technologies. In this case, though, the specifics make this workload clearly have an AI component. The Wasm-based Tesseract library that we use in WebXPRT 4 is based on a version of C/C++ (v4.x) that uses Long Short-Term Memory (LSTM). LSTM is a type of recurrent neural network (RNN) that is particularly well-suited for processing and predicting sequential data. As such, it is clearly an AI component of the Encrypt Notes and OCR Scan workload.

To produce a score for each iteration of the workload, WebXPRT calculates the total time that it takes for a system to sync (encrypt, decrypt, and render) the notes, use OCR to scan the receipt, and add the scanned data to an expense report. In a standard test, WebXPRT runs seven iterations of the entire six-workload performance suite before calculating an overall test score. You can find out more about the WebXPRT results calculation process here.

Along with our post on the Organize Album workload, we hope this information provides a deeper understanding of WebXPRT 4’s AI-equipped workloads. As we mentioned last time, if you want to explore the structure of these workloads in more detail, you can check out previous blog posts for information about how to access and use the WebXPRT 4 source code for free. You can also read more about WebXPRT’s overall structure and other workloads in the Exploring WebXPRT 4 white paper.

If you have any questions about WebXPRT 4, please let us know!

Justin

Browser-based AI tests in WebXPRT 4: face detection and image classification

I recently revisited an XPRT blog entry that we posted from CES Las Vegas back in 2020. In that post, I reflected on the show’s expanded AI emphasis, and I wondered if we were reaching a tipping point where AI-enhanced and AI-driven tools and applications would become a significant presence in people’s daily lives. It felt like we were approaching that point back then with the prevalence of AI-powered features such as image enhancement and text recommendation, among many others. Now, seamless AI integration with common online tasks has become so widespread that many people unknowingly benefit from AI interactions several times a day.

As AI’s role in areas like everyday browser activity continues to grow—along with our expectations for what our consumer devices should be able to handle—reliable AI-oriented benchmarking is more vital than ever. We need objective performance data that can help us understand how well a new desktop, laptop, tablet, or phone will handle AI tasks.

WebXPRT 4 already includes timed AI tasks in two of its workloads: the “Organize Album using AI” workload and the “Encrypt Notes and OCR Scan” workload. These two workloads reflect the types of light browser-side inference tasks that are now fairly common in consumer-oriented web apps and extensions. In today’s post, we’ll provide some technical information about the Organize Album workload. In a future post, we’ll do the same for the Encrypt Notes workload.

The Organize Album workload includes two different timed tasks that reflect a common scenario of organizing online photo albums. The workload utilizes the AI inference and JavaScript capabilities of the WebAssembly (Wasm) version of OpenCV.js—an open-source computer vision and machine learning library. In WebXPRT 4, we used OpenCV.js version 4.5.2.

Here are the details for each task:

  • The first task measures the time it takes to complete a face detection job with a set of five 720 x 480 photos that we sourced from commercial photo sites. The workload loads a Caffe deep learning framework model (res10_300x300_ssd_iter_140000_fp16.caffemodel) using the commands found here
  • The second task measures the time it takes to complete an image classification job (labeling based on object detection) with a different set of five 718 x 480 photos that we sourced from the ImageNet computer vision dataset. The workload loads an ONNX-based SqueezeNet machine learning model (squeezenet.onnx v 1.0) using the commands found here.

To produce a score for each iteration of the workload, WebXPRT calculates the total time that it takes for a system to organize both albums. In a standard test, WebXPRT runs seven iterations of the entire six-workload performance suite before calculating an overall test score. You can find out more about the WebXPRT results calculation process here.

We hope this post will give you a better sense of how WebXPRT 4 measures one kind of AI performance. As a reminder, if you want to dig into the details at a more granular level, you can access the WebXPRT 4 source code for free. In previous blog posts, you can find information about how to access and use the code. You can also read more about WebXPRT’s overall structure and other workloads in the Exploring WebXPRT 4 white paper.

If you have any questions about this workload or any other aspect of WebXPRT 4, please let us know!

Justin

Web AI frameworks: Possible paths for the AI-focused WebXPRT 4 auxiliary workload

A few months ago, we announced that we’re moving forward with the development of a new auxiliary WebXPRT 4 workload focused on local, browser-side AI technology. Local AI has many potential benefits, and it now seems safe to say that it will be a common fixture of everyday life for many people in the future. As the growth of browser-based inference technology picks up steam, our goal is to equip WebXPRT 4 users with the ability to quickly and reliably evaluate how well devices can handle substantial local inference tasks in the browser.

To reach our goal, we’ll need to make many well-researched and carefully considered decisions along the development path. Throughout the decision-making process, we’ll be balancing our commitment to core XPRT values, such as ease of use and widespread compatibility, with the practical realities of working with rapidly changing emergent technologies. In today’s blog, we’re discussing one of the first decision points that we face—choosing a Web AI framework.

AI frameworks are suites of tools and libraries that serve as building blocks for developers to create new AI-based models and apps or integrate existing AI functions in custom ways. AI frameworks can be commercial, such as OpenAI, or open source, such as Hugging Face, PyTorch, and TensorFlow. Because the XPRTs are available at no cost for users and we publish our source code, open-source frameworks are the right choice for WebXPRT.

Because the new workload will focus on locally powered, browser-based inference tasks, we also need to choose an AI framework that has browser integration capabilities and does not rely on server-side computing. These types of frameworks—called Web AI—use JavaScript (JS) APIs and other web technologies, such as WebAssembly and WebGPU, to run machine learning (ML) tasks on a device’s CPU, GPU, or NPU.

Several emerging Web AI frameworks may provide the compatibility and functionality we need for the future WebXPRT workload. Here are a few that we’re currently researching:

  • ONNX Runtime Web: Microsoft and other partners developed the Open Neural Network Exchange (ONNX) as an open standard for ML models. With available tools, users can convert models from several AI frameworks to ONNX, which can then be used by ONNX Runtime Web. ONNX Runtime Web allows developers to leverage the broad compatibility of ONNX-formatted ML models—including pre-trained vision, language, and GenAI models—in their web applications.
  • Transformers.js: Transformers.js, which uses ONNX Runtime Web, is a JS library that allows users to run AI models from the browser and offline. Transformers.js supports language, computer vision, and audio ML models, among others.
  • MediaPipe: Google developed MediaPipe as a way for developers to adapt TensorFlow-based models for use across many platforms in real-time on-device inference applications such as face detection and gesture recognition. MediaPipe is particularly useful for inference work in images, videos, and live streaming.
  • TensorFlow.js: TensorFlow has been around for a long time, and the TensorFlow ecosystem provides users with a broad variety of models and datasets. TensorFlow is an end-to-end ML solution—training to inference—but with available pre-trained models, developers can focus on inference. TensorFlow.js is an open-source JS library that helps developers integrate TensorFlow with web apps.

We have not made final decisions about a Web AI framework or any aspect of the future workload. We’re still in the research, discussion, and experimentation stages of development, but we want to be transparent with our readers about where we are in the process. In future blog posts, we’ll discuss some of the other major decision points in play.

Most of all, we invite you to join us in these discussions, make recommendations, and give us any other feedback or suggestions you may have, so please feel free to share your thoughts!

Justin

Local AI and new frontiers for performance evaluation

Recently, we discussed some ways the PC market may evolve in 2024, and how new Windows on Arm PCs could present the XPRTs with many opportunities for benchmarking. In addition to a potential market shakeup from Arm-based PCs in the coming years, there’s a much broader emerging trend that could eventually revolutionize almost everything about the way we interact with our personal devices—the development of local, dedicated AI processing units for consumer-oriented tech.

AI already impacts daily life for many consumers through technologies such as such as predictive text, computer vision, adaptive workflow apps, voice recognition, smart assistants, and much more. Generative AI-based technologies are rapidly establishing a permanent, society-altering presence across a wide range of industries. Aside from some localized inference tasks that the CPU and/or GPU typically handle, the bulk of the heavy compute power that fuels those technologies has been in the cloud or in on-prem servers. Now, several major chipmakers are working to roll out their own versions of AI-optimized neural processing units (NPUs) that will enable local devices to take on a larger share of the AI load.

Examples of dedicated AI hardware in recently-released or upcoming consumer devices include Intel’s new Meteor Lake NPU, Apple’s Neural Engine for M-series SoCs, Qualcomm’s Hexagon NPU, and AMD’s XDNA 2 architecture. The potential benefits of localized, NPU-facilitated AI are straightforward. On-device AI could reduce power consumption and extend battery life by offloading those tasks from the CPUs. It could alleviate certain cloud-related privacy and security concerns. Without the delays inherent in cloud queries, localized AI could execute inference tasks that operate much closer to real time. NPU-powered devices could fine-tune applications around your habits and preferences, even while offline. You could pull and utilize relevant data from cloud-based datasets without pushing private data in return. Theoretically, your device could know a great deal about you and enhance many areas of your daily life without passing all that data to another party.

Will localized AI play out that way? Some tech companies envision a role for on-device AI that enhances the abilities of existing cloud-based subscription services without decoupling personal data. We’ll likely see a wide variety of capabilities and services on offer, with application-specific and SaaS-determined privacy options.

Regardless of the way on-device AI technology evolves in the coming years, it presents an exciting new frontier for benchmarking. All NPUs will not be created equal, and that’s something buyers will need to understand. Some vendors will optimize their hardware more for computer vision, or large language models, or AI-based graphics rendering, and so on. It won’t be enough for business and consumers to simply know that a new system has dedicated AI processing abilities. They’ll need to know if that system performs well while handling the types of AI-related tasks that they do every day.

Here at the XPRTs, we specialize in creating benchmarks that feature real-world scenarios that mirror the types of tasks that people do in their daily lives. That approach means that when people use XPRT scores to compare device performance, they’re using a metric that can help them make a buying decision that will benefit them every day. We look forward to exploring ways that we can bring XPRT benchmarking expertise to the world of on-device AI.

Do you have ideas for future localized AI workloads? Let us know!

Justin

The AIXPRT learning tool is now live (and a CloudXPRT version is on the way)!

We’re happy to announce that the AIXPRT learning tool is now live! We designed the tool to serve as an information hub for common AIXPRT topics and questions, and to help tech journalists, OEM lab engineers, and everyone who is interested in AIXPRT find the answers they need in as little time as possible.

The tool features four primary areas of content:

  • The Q&A section provides quick answers to the questions we receive most from testers and the tech press.
  • The AIXPRT: the basics section describes specific topics such as the benchmark’s toolkits, networks, workloads, and hardware and software requirements.
  • The testing and results section covers the testing process, metrics, and how to publish results.
  • The AI/ML primer provides brief, easy-to-understand definitions of key AI and ML terms and concepts for those who want to learn more about the subject.

The first screenshot below shows the home screen. To show how some of the popup information sections appear, the second screenshot shows the Inference tasks (workloads) entry in the AI/ML Primer section. 

We’re excited about the new AIXPRT learning tool, and we’re also happy to report that we’re working on a version of the tool for CloudXPRT. We hope to make the CloudXPRT tool available early next year, and we’ll post more information in the blog as we get closer to taking it live.

If you have any questions about the tool, please let us know!

Justin

Check out the other XPRTs:

Forgot your password?