Google and Baidu collaborated with researchers at Harvard and Stanford to define a suite of benchmarks for machine learning (ML).
The 'MLPerf' effort aims to build a common set of benchmarks that enables the machine learning field to measure system performance for both training and inference from mobile devices to cloud services.
So far, AMD, Intel, two AI startups, and two other universities have expressed support for MLPerf, an initial version of which will be ready for use in August.
Other supporters include the University of California at Berkeley, the University of Minnesota, and the University of Toronto as well as two AI startups, SambaNova and Wave Computing.
The goals of MLPerf is to help companies and researchers accelerate progress in ML via fair and useful measurement. It will enable fair comparison of competing systems, enforce replicability to ensure reliable results while keeping benchmarking effort affordable so all can participate.
The first release of MLPerf will focus on training jobs on a range of systems from workstations to large data centers, a big pain point for web giants such as Baidu and Google. Later releases will expand to include inference jobs, eventually extended to include ones run on embedded client systems.
An early version of the suite running on a variety of AI frameworks will be ready to run in about three months.
Initially, MLPerf will measure the average time to train a model to a minimum quality, probably in hours. Given that these jobs are run on large banks of servers, it may not report performance per watt. It will take into consideration the costs of jobs as long as price does not vary over the time of day that they are run.
Nvidia's P100 Volta chip will be a reference standard because it is widely employed by data centers for training. The group aims to update published results every three months.
MLPerf will use two modes: A Closed Model Division specifies the model to be used and restricts the values of hyper parameters, e.g. batch size and learning rate, with the emphasis being on fair comparisons of the hardware and software systems. (The Sort equivalent is called "Daytona," alluding to the stock cars at the Daytona 500 mile race.)
In the MLPerf Open Model Division, competitors must solve the same problem using the same data set but with fewer restrictions, with the emphasis being on advancing the state-of-the-art of ML. (The Sort equivalent was called "Indy," alluding to the even faster Formula One custom race cars designed for events like the Indianapolis 500.)