A few months ago, we announced
that we’re moving forward with the development of a new auxiliary
WebXPRT 4 workload focused on local, browser-side AI technology. Local AI has many
potential benefits,
and it now seems safe to say that it will be a common fixture of everyday life
for many people in the future. As the growth of browser-based inference
technology picks up steam, our goal is to equip WebXPRT 4 users with the
ability to quickly and reliably evaluate how well devices can handle
substantial local inference tasks in the browser.
To
reach our goal, we’ll need to make many well-researched and carefully
considered decisions along the development path. Throughout the decision-making
process, we’ll be balancing our commitment to core XPRT values, such as ease of
use and widespread compatibility, with the practical realities of working with
rapidly changing emergent technologies. In today’s blog, we’re discussing one
of the first decision points that we face—choosing a Web AI framework.
AI
frameworks are suites of tools and libraries that serve as building blocks for
developers to create new AI-based models and apps or integrate existing AI
functions in custom ways. AI frameworks can be commercial, such as OpenAI, or open source, such as Hugging Face, PyTorch, and TensorFlow. Because the XPRTs are available
at no cost for users and we publish our source code, open-source frameworks are
the right choice for WebXPRT.
Because the new workload will focus on locally powered, browser-based inference tasks, we also need to choose an AI framework that has browser integration capabilities and does not rely on server-side computing. These types of frameworks—called Web AI—use JavaScript (JS) APIs and other web technologies, such as WebAssembly and WebGPU, to run machine learning (ML) tasks on a device’s CPU, GPU, or NPU.
Several emerging Web AI frameworks may provide the compatibility and functionality we need for the future WebXPRT workload. Here are a few that we’re currently researching:
- ONNX Runtime Web: Microsoft and other partners developed the Open Neural Network Exchange (ONNX) as an open standard for ML models. With available tools, users can convert models from several AI frameworks to ONNX, which can then be used by ONNX Runtime Web. ONNX Runtime Web allows developers to leverage the broad compatibility of ONNX-formatted ML models—including pre-trained vision, language, and GenAI models—in their web applications.
- Transformers.js: Transformers.js, which uses ONNX Runtime Web, is a JS library that allows users to run AI models from the browser and offline. Transformers.js supports language, computer vision, and audio ML models, among others.
- MediaPipe: Google developed MediaPipe as a way for developers to adapt TensorFlow-based models for use across many platforms in real-time on-device inference applications such as face detection and gesture recognition. MediaPipe is particularly useful for inference work in images, videos, and live streaming.
- TensorFlow.js: TensorFlow has been around for a long time, and the TensorFlow ecosystem provides users with a broad variety of models and datasets. TensorFlow is an end-to-end ML solution—training to inference—but with available pre-trained models, developers can focus on inference. TensorFlow.js is an open-source JS library that helps developers integrate TensorFlow with web apps.
We have not made final decisions
about a Web AI framework or any aspect of the future workload. We’re still in
the research, discussion, and experimentation stages of development, but we
want to be transparent with our readers about where we are in the process. In
future blog posts, we’ll discuss some of the other major decision points in
play.
Most
of all, we invite you to join us in these discussions, make recommendations,
and give us any other feedback or suggestions you may have, so please feel free
to share your thoughts!
Justin