BenchmarkXPRT Blog banner

Category: Machine learning

Evaluating machine learning performance

A  few weeks ago, I discussed the rising importance of machine learning and our efforts to develop a tool to help in evaluating its performance. Here is an update on our thinking.

One thing we are sure of is that we can’t cover everything in machine learning. The field is evolving rapidly, so we think the best approach is to pick a good place to start and then build from there.

One of the key areas we need to hone in on is the algorithms that we will employ in MLXPRT. (We haven’t formally decided on a name, but are currently using MLXPRT internally when we talk about what we’ve been doing.)

Computer vision, or image detection, seems to be a good place to start. We see three specific sets of algorithms to possibly cover. Worth noting, there is plenty of muddying of lines amongst these sets.

The first set of computer vision algorithms performs image classification. These algorithms identify things like a cat or a dog in an image. Some of the most popular algorithms are Alexnet and GoogLeNet, as well as ones from VGG . The initial training and use for these was on the ImageNet database, containing over 10 million images.

The next set of algorithms in computer vision performs object detection and localization. The algorithms identify the contents and their spatial location in an image, and typically draw bounding boxes around them. A couple of the most popular algorithms are Faster R-CNN and Single Shot MultiBox Detector (SSD).

The final set of computer vision algorithms perform image segmentation. Rather than just drawing a box around an object, image segmentation attempts to classify each pixel in an image by the object it is a part of. The result looks like a contour/color map that shows the different objects in the image. These techniques can be especially useful in autonomous vehicles and medical diagnostic imaging. Currently, the leading algorithms in image segmentation are fully convolution networks (FCN), but the area is developing rapidly.

Even limiting the initial version of MLXPRT to computer vision may be too broad. For example, we may end up only doing image classification and object detection.

As always, we crave input from folks, like yourself, who are working in these areas. What would you most like to see in a machine learning performance tool?

Bill

Learning about machine learning

Everywhere we look, machine learning is in the news. It’s driving cars and beating the world’s best Go players. Whether we are aware of it or not, it’s in our lives–understanding our voices and identifying our pictures.

Our goal of being able to measure the performance of hardware and software that does machine learning seems more relevant than ever. Our challenge is to scan the vast landscape that is machine learning, and identify which elements to measure first.

There is a natural temptation to see machine learning as being all about neural networks such as AlexNet and GoogLeNet. However, new innovations appear all the time and lots of important work with more classic machine learning techniques is also underway. (Classic machine learning being anything more than a few years old!) Recursive neural networks used for language translation, reinforcement learning used in robotics, and support vector machine (SVM) learning used in text recognition are just a few examples among the wide array of algorithms to consider.

Creating a benchmark or set of benchmarks to cover all those areas, however, is unlikely to be possible. Certainly, creating such an ambitious tool would take so long that it would be of limited usefulness.

Our current thinking is to begin with a small set of representative algorithms. The challenge, of course, is identifying them. That’s where you come in. What would you like to start with?

We anticipate that the benchmark will focus on the types of inference learning and light training that are likely to occur on edge devices. Extensive training with large datasets takes place in data centers or on systems with extraordinary computing capabilities. We’re interested in use cases that will stress the local processing power of everyday devices.

We are, of course, reaching out to folks in the machine learning field—including those in academia, those who create the underlying hardware and software, and those who make the products that rely on that hardware and software.

What do you think?

Bill

Thinking ahead to WebXPRT 2017

A few months ago, Bill discussed our intention to update WebXPRT this year. Today, we want to share some initial ideas for WebXPRT 2017 and ask for your input.

Updates to the workloads provide an opportunity to increase the relevance and value of WebXPRT in the years to come. Here are a few of the ideas we’re considering:

  • For the Photo Enhancement workload, we can increase the data sizes of pictures. We can also experiment with additional types of photo enhancement such as background/foreground subtraction, collage creation, or panoramic/360-degree image viewing.
  • For the Organize Album workload, we can explore machine learning workloads by incorporating open source JavaScript libraries into web-based inferencing tests.
  • For the Local Notes workload, we’re investigating the possibility of leveraging natural-brain libraries for language processing functions.
  • For a new workload, we’re investigating the possibility of using online 3D modeling applications such as Tinkercad.

 
For the UI, we’re considering improvements to features like the in-test progress bars and individual subtest selection. We’re also planning to update the UI to make it visually distinct from older versions.

Throughout this process, we want to be careful to maintain the features that have made WebXPRT our most popular tool, with more than 141,000 runs to date. We’re committed to making sure that it runs quickly and simply in most browsers and produces results that are useful for comparing web browsing performance across a wide variety of devices.

Do you have feedback on these ideas or suggestions for browser technologies or test scenarios that we should consider for WebXPRT 2017? Are there existing features we should ditch? Are there elements of the UI that you find especially useful or would like to see improved? Please let us know. We want to hear from you and make sure that we’re crafting a performance tool that continues to meet your needs.

Justin

Reflecting on 2016

The beginning of a new year is a good time to look back on the previous 12 months and take stock of everything that happened. Here’s a quick recap of a very busy year:

In 2016, the XPRTs travelled quite a bit. Eric went to CES in Las Vegas, Mark attended MWC in Barcelona, and Bill flew out to IDF16 in Shenzhen.

We also sent a team to Seattle for the first XPRT Women Code-A-Thon, an event we’re very proud to have sponsored and co-hosted along with ChickTech, a nonprofit organization dedicated to increasing the number of women in tech-related fields. The Code-a-thon also served as inspiration for an eight-part video series entitled Women Coding for Change. The series explains the motivation behind the Code-a-thon and profiles several of the participants. If you haven’t watched the videos, check them out. They’re well worth the time.

Speaking of videos, we also published one about Nebula Wolf, the mini-game workload produced through our first collaboration with the North Carolina State Senior Design Center. That experience was promising enough for us to partner with another student team this past fall, which resulted in a virtual reality app that we hope to share with the community in the near future.

Of course, we also continued work on our suite of benchmark tools and related resources. We released TouchXPRT 2016 to the public, published the Exploring TouchXPRT 2016 white paper, and released the TouchXPRT 2016 source code to community members.

In 2016, we unveiled the XPRT Weekly Tech Spotlight, a new way for device vendors and manufacturers to share verified test results with buyers around the world. We put 46 devices in the spotlight throughout the year and published Back-to-School, Black Friday, and Holiday device showcases.

In the last quarter of 2016, we celebrated our most widely-used benchmark, WebXPRT, passing the 100,000-run milestone. WebXPRT is still going strong and is as useful and relevant as ever!

Finally, we ended the year with the exciting news that we’re moving forward with efforts to develop a machine-learning performance evaluation tool. We look forward to engaging with the community in the coming year as we tackle this challenge!

As always, we’re grateful for everyone who’s helped to make the BenchmarkXPRT Development Community a strong, vibrant, and relevant resource for people all around the world. Here’s to a great 2017!

Justin

Creating a machine-learning benchmark

Recently, we wrote about one of the most exciting emerging technology areas, machine learning, and the question of what role the XPRTs could play in the field.

Experts expect machine learning to be the analytics backbone of the IoT data explosion. It is a disruptive technology with potential to influence a broad range of industries. Consumer and industrial applications that take advantage of machine-learning advancements in computer vision, natural language processing, and data analytics are already available and many more are on the way.

Currently, there is no comprehensive machine-learning or deep-learning benchmark that includes home, automotive, industrial, and retail use cases. The challenge with developing a benchmark for machine learning is that these are still the early days of the technology. A fragmented software and hardware landscape and lack of standardized implementations makes benchmarking machine learning complex and challenging.

Based on the conversations we’ve had over the last few weeks, we’ve decided to take on that challenge. With the community’s help, of course!

As we outlined in a blog entry last month, we will work with interested folks in the community, key vendors, and academia to pull together what we are internally calling MLXPRT.

While the result may differ substantially from the existing XPRTs, we think the need for something is great. Whether that will turn out to be a packaged tool or just sample code and workloads remains to be seen.

What we need most your help. We need both general input about what you would like to see as well as any expertise you may have. Let us know any questions you may have or ways you can help.

On a related note, I’ll be at CES 2017 in Las Vegas during the first week of January. I’d love to meet and talk more about machine learning, benchmarking, or the XPRTs. If you’re planning to be there and would like to connect, let us know.

We will not have a blog entry next week over the holidays, so we wish all of you a wonderful time with your families and a great start to the new year.

Bill

Machine learning

A couple months ago I wrote about doing an inventory of our XPRT tools. Part of that is taking a close look at the six existing XPRTs. The first result of that effort was what I recently wrote about HDXPRT. We’re also looking at emerging technology areas where the BenchmarkXPRT Community has expertise that can guide us.

One of the most exciting of these areas is machine learning. It has rapidly gone from interesting theoretical research (they called them “neural nets” back when I was getting my computer science degree) to something we all use whether we realize it or not. Machine learning (or deep learning) is in everything from intelligent home assistants to autonomous automobiles to industrial device monitoring to personalized shopping in retail environments.

The challenge with developing a benchmark for machine learning is that these are still the early days of the technology. In the past, XPRTs have targeted technologies later in the product cycle. We’re wondering how the XPRT model and the members of its community can play a role here.

One possible use of a machine-learning XPRT is with drones, a market that includes many vendors. Consumers, hobbyists, builders, and the companies creating off-the-shelf models could all benefit from tools and techniques that fairly compare drone performance.

The best approach we’ve come up with to define a machine-learning XPRT starts with identifying common areas such as computer vision, natural language processing, and data analytics, and then, within each of those areas, identifying common algorithms such as AlexNet, GoogLeNet, and VGG. We would also look at the commonly used frameworks such as Caffe, Theano, TensorFlow, and CNTK.

The result might differ from an existing XPRT where you simply run a tool and get a result. Instead, it might take the form of sample code and workloads. Or, maybe even one or two executables that could be used in the most common environments.

At this point, our biggest question is, What do you think? Is this an area you’re interested in? If so, what would you like to see a machine-learning XPRT do?

We’re actively engaging with people in these emerging markets to gauge their interest as well. Regardless of the feedback, we’re excited about the possibilities!

Bill

Check out the other XPRTs:

Forgot your password?