Intel compared the inference performance of two of their most expensive CPUs to NVIDIA GPUs, and claimed a victory, although Nvidia disagrees.
Intel said it had achieved leadership performance of 7878 images per second on ResNet-50 with its latest generation of Intel Xeon Scalable processors, outperforming 7844 images per second on NVIDIA Tesla V100, the best GPU performance as published by NVIDIA on its website including T4.
Specifically, Intel used a two-socket Intel Xeon 9282a processor, a high core-count multi-chip packaged server multiprocessor, using Intel Optimization for Caffe. Intel achieved 7878 images per second by simultaneously running 28 software instances each one across four cores with batch size 11. The performance on NVIDIA Tesla V100 is 7844 images per second and NVIDIA Tesla T4 is 4944 images per second per NVIDIA's published numbers.
However, to achieve the performance of a single mainstream NVIDIA V100 GPU, Intel combined two power-hungry, highest-end CPUs with an estimated price of $50,000-$100,000.
"Intel’s performance comparison highlighted the clear advantage of NVIDIA T4 GPUs, which are built for inference. When compared to a single highest-end CPU, they’re not only faster but also 7x more energy-efficient and an order of magnitude more cost-efficient," Nvidia said.
Intel Xeon Scalable processors are widely available in clouds and data centers. However, Intel says that having the CPU with high deep learning capabilities gives AI customers flexibility to manage their compute infrastructure uniformly and cost effectively.
Deep learning is used in image/video processing, natural language processing, personalized recommender systems, and reinforcement learning. The types of workloads and algorithms are rapidly expanding. Intel claims that a general purpose CPU is very adaptable to this dynamically changing environment.
Obviously, Intel’s latest Cascade Lake CPUs include new instructions that improve inference, making them the best CPUs for inference. However, Nvidia says that these Xeons are hardly competitive with NVIDIA deep learning-optimized Tensor Core GPUs.
Inference (also known as prediction), in simple terms, is the “pattern recognition” that a neural network does after being trained. It’s where AI models provide intelligent capabilities in applications, like detecting fraud in financial transactions, conversing in natural language to search the internet, and predictive analytics to fix manufacturing breakdowns before they even happen.
While most AI inference today happens on CPUs, NVIDIA Tensor Core GPUs are also being adopted across the full range of AI models. Tensor Core has transformed NVIDIA GPUs to highly efficient AI processors. Tensor Cores do multi-precision calculations at high rates to provide optimal precision for diverse AI models and have automatic support in popular AI frameworks.
A measure of the complexity of AI models is the number of parameters they have. Parameters in an AI model are the variables that store information the model has learned. While ResNet-50 has 25 million parameters, BERT has 340 million, a 13x increase.
So Nvidia striked back to Intel, saying that on an advanced model like BERT, a single NVIDIA T4 GPU is 56x faster than a dual-socket CPU server (Dual Intel Xeon Gold 6240) and 240x more power-efficient.
Another key usage of AI is in recommendation systems, which are used to provide relevant content recommendations on video sharing sites, news feeds on social sites and product recommendations on e-commerce sites.
Neural collaborative filtering, or NCF, is a recommender system that uses the prior interactions of users with items to provide recommendations. According to Nvidia, when running inference on the NCF model that is a part of the MLPerf 0.5 training benchmark, NVIDIA T4 brings 12x more performance and 24x higher energy efficiency than CPUs (Single Intel Xeon Gold 6140)