Intel and DOE to Deliver First U.S. Exascale Supercomputer

Intel and the U.S. Department of Energy (DOE) will deliver the first supercomputer with a performance of one exaFLOP in the United States.

The system being developed at DOE’s Argonne National Laboratory in Chicago, named “Aurora,” will be used to advance scientific research and discovery. The contract is valued at more than $500 million and will be delivered to Argonne National Laboratory by Intel and sub-contractor Cray Inc. in 2021.

The Aurora system’s exaFLOP of performance – equal to a “quintillion” floating point computations per second – combined with an ability to handle both traditional high-performance computing (HPC) and artificial intelligence (AI) will give researchers a set of tools to address scientific problems at exascale. These research projects range from developing extreme-scale cosmological simulations, discovering new approaches for drug response prediction and discovering materials for the creation of more efficient organic solar cells.

The foundation of the Aurora supercomputer will be new Intel technologies designed specifically for the convergence of artificial intelligence and high-performance computing at extreme computing scale. These include a future generation of the Intel Xeon Scalable processor, Intel’s Xe compute architecture, a future generation of Intel Optane DC Persistent Memory, and Intel’s One API software. Aurora will use Cray’s next-generation supercomputer system, code-named “Shasta,” which will comprise more than 200 cabinets and include Cray’s SlingshotThigh-performance scalable interconnect and the Shasta software stack optimized for Intel architecture.

Slingshot is Ethernet-compatible, able to connect to third-party data storage, and can do it all so systems can run at exascale speeds. With 64-port switches capable of 25.6 Tb/s per switch, it can support up to a quarter million endpoints in just three network hops, requiring only one optical cable.

Slingshot also features Ethernet capability, advanced adaptive routing, first-of-a-kind congestion control, and sophisticated quality-of-service capabilities. It also reductes the network diameter from five hops (in the current Cray XC generation) to three, with benefits to cost latency, power and sustained bandwidth.

Intel's so-called Xe product is believed to be a member of the GPU family in design under Raja Koduri. Representatives of Intel and Argonne declined to reveal the size, power consumption, and architecture of Aurora or details of the Xe accelerator. Alternatively, the Aurora accelerator might be an FPGA or multicore x86 part replacing the discontinued Xeon Phi or a hybrid. Intel could also leverage technology that it acquired with Nervana and intends to ship later this year in versions for AI training and inference.

The DoE plans to spend a total of $1.8 billion on a total of three exascale systems. It is expected to soon announce that the team of IBM and Nvidia will build the two other systems using their future power processors and GPUs. The IBM/Nvidia systems would be follow-ons to Summit and Sierra, currently ranked as the two most powerful supercomputers in the world at 143 and 94 petaflops, respectively.

The DoE’s other two exascale systems, called Frontier and El Capitan, will be built at the Oak Ridge and Lawrence Livermore Labs, respectively, where the Summit and Sierra systems currently run.

All of the U.S. systems are expected to deliver a peak performance of up to 1.3 exaflops, using up to 8 petabytes of memory and consuming about 40 MW.

The world’s current most powerful machine, the Summit supercomputer at Oak Ridge National Laboratory in Tennessee, contains chips from IBM and Nvidia.

Nvidia’s chips are found in five of the world’s current top-10 supercomputers, though the Nvidia chips are found alongside chips from its rivals, according to TOP500, which ranks the machines.

The source of chips for supercomputers has become a factor in trade tensions between the United States and China. The world’s third-fastest supercomputer - the Sunway TaihuLight in China - has chips developed domestically in China. Actually one exascale project in China is a follow-on of the Sunway TaihuLight in Wuxi. It uses a whopping 10.6 million proprietary cores to deliver 93 petaflops.

The second China effort is a follow-on to the Tianhe-2A in Guangzhou currently using Xeon CPUs and Matrix-2000 accelerators designed by China’s National University of Defense Technology. It is ranked fourth worldwide at 61 petaflops. China’s third exascale effort is a new project led by server maker Sugon with x86 chips believed to be derived from AMD’s Zen core as part of a 2016 joint venture.