Intel today launched its new Intel Xeon Scalable processors, claiming that they sport an average 65% performance boost over its prior Broadwell chips.
The Intel Xeon Scalable Processor offers different levels of scale and performance capabilities. Options include:
- Intel Xeon Platinum Processor: Offers best performance, hardware-enhanced security, and business agility to support
mission-critical, real-time analytics, and AI, workload-optimized for general purpose compute and multi-cloud deployments, and performance for the most demanding storage and networking workloads and designed for 99.999% service availability
- Intel Xeon Gold Processor: Offers great performance, fast memory, more IO/accelerator engines, and advanced RAS for significant workload-optimized performance and platform
improvements for general purpose compute.
- Intel Xeon Silver Processor: Offers efficient performance for solid compute capability and moderate range of workloads.
- Intel Xeon Bronze Processor: Offers entry performance for light range workloads and delivers capability upgrades vs. Intel Xeon E3.
The top-end versions of the new Xeon Scalable family nudge ahead of rival AMD's recently released Epyc CPUs in performance but don't pack as much I/O. Nevertheless, AMD's Epyc and a rising tide of ARM-based server chips from Qualcomm and others are expected to find significant footholds in the broad and diverse cloud computing sector.
According to data released by Intel, the Platinum 8180 and 8160 versions of Skylake edged AMD's Epyc 7601 by 2% to 28% in performance and by 12% to 22% in performance/watt on the Specint_rate2006 benchmark. But as always, have in mind that these results could be skewed by Intel's tendency to use optimized compilers for its benchmarks compared to standard ones that AMD uses.
The high-end 8100 series packs 28 cores running at up to 3.6 GHz with up to 48 PCIe 3.0 lanes and six channels of DDR4-2666 memory. AMD's high-end Epyc packs up to 32 cores and all nine of its family support 128 PCI Express 3.0 lanes and eight DDR4-2666 channels.
Intel showed tests with two dozen companies, each with different workloads. Results ranged from Skylake beating Broadwell chips by 1.4x for Ansys manufacturing software to 2.2x for apps using Skylake along with Intel's proprietary Optane solid-state memory drives.
Last month, AMD showed a range of benchmarks for Epyc that averaged around 45% more performance than Broadwell. However, the server sector includes a wide range of markets and requirements, many where Intel will have an edge and a few where AMD may score hits.
For example, AMD hopes to use its advantage in PCIe and DDR4 to replace dual-socket Broadwell with single-socket Epyc servers. However, Skylake's new AVX-512 vector processing extensions far outstrip Epyc's abilities in floating-point intensive jobs.
Skylake uses a single processor die with a separate I/O chip. Epyc packs four die in a package including I/O, giving AMD greater flexibility and lower cost at the expense of latency in some operations.
Skylake's gains come from a list of generally step-wise innovations, including an upgraded microarchitecture and expanded instruction set. The chips use a mesh network-on-chip that Intel says provides more bandwidth and more consistent low latencies than prior ring buses. Mesh architecture aligns cores, on-chip cache banks, memory controllers, and I/O controllers which are organized in rows and columns, with wires and switches connecting them at each intersection to allow for turns, resulting in improved performance and greater energy efficiency "similar to a well-designed highway system that lets traffic flow at the optimal speed without congestion," as Intel says.
AVX-512 doubles single- and double-precision performance to 64 and 32 flops/cycle, respectively, over the AVX2 on Broadwell. It does this while maintaining the same power levels and lowering frequency requirements of Intel's previous chips.
The extensions support up to 85.33 INT8 and 64 FP32 operations/cycle per core, boosting performance on machine-learning training and inference jobs, said Intel, adding that Skylake gives a 3.4x boost over Broadwell in integer general matrix multiply tasks.
Rather than expanding cache size, Intel revamped its approach to caching. Thus, the chips use slightly less memory but they are better optimized for data centers.
The companion I/O chip for Skylake, called Lewisburg, supports four 10G Ethernet ports compared to a single GE port for the Broadwell I/O chip. It is also the first to integrate the crypto and compression functions that Intel calls its Quick Assist technology.
Intel also boosted its processor bus, now called the Ultra Path Interconnect, from 9.6 to 10.4 GTransfers/second. It put up to three of the links on high-end chips.
The Xeon Scalable family consists of nearly 50 versions made in different variants of Intel's 14-nm process. Prices range from nearly $9,000 for eight-socket versions to about $400 for entry-level parts.
They range from consuming 205 to 70 W. The low-end bronze 3100 series uses up to eight cores running at 1.7 GHz, supporting DDR4-2133 but not dual threading.
A handful of the new devices put Intel's Omnipath interconnect in the same package as the processor for high-performance computing. Intel is sampling versions that put an FPGA in the package but won't ship the products until early next year.