Fujitsu today announced the specifications for the A64FX CPU to be featured in the post-K computer, a supercomputer being developed by Fujitsu and RIKEN as a successor to the K computer.
The organizations are striving to achieve post-K application execution performance up to 100 times that of the K computer.
A64FX is the first CPU to adopt the Scalable Vector Extension (SVE), an extension of Armv8-A instruction set architecture for supercomputers. This chip offers peak performance of over 2.7 TFLOPS, demonstrating superior HPC and AI performance.
Fujitsu made the announcement at Hot Chips 30, an international symposium on high performance processors and related technologies that was held in Silicon Valley, California from August 19-21.
Post-K is the successor to the K computer which in 2011 achieved the highest ranking in the world on the TOP500 list of supercomputers around the world. Fujitsu and RIKEN are developing post-K, aiming for starting operation around 2021.
A64FX is the high-performance CPU that will be used in post-K. It offers a number of features, including broad utility supporting a wide range of applications, massive parallelization through the Tofu interconnect, low power consumption, and mainframe-class reliability.
A64FX is the first CPU to adopt the SVE of Arm Limited's Armv8-A instruction set architecture, extended for supercomputers. Fujitsu collaborated with Arm, contributing to the development of the SVE as a lead partner, and adopted the results in the A64FX.
Fujitsu developed the microarchitecture of the A64FX by building on the technology of its previous supercomputers, mainframes, and UNIX servers. With hardware technology that draws out the high memory bandwidth of high performance stacked memory, the system can efficiently utilize the CPU's high functional computational processing units, enabling delivery of high application execution performance. The CPUs will be directly connected by the proprietary Tofu interconnect developed for the K computer, improving parallel performance. The system can provide a peak double precision (64 bit) floating point operations performance of over 2.7 TFLOPS, with a computational throughput twice that amount for single precision (32 bit), and four times that amount for half precision (16 bit). In other words, by using single precision or half precision operations, applications can get results even faster. Fujitsu has also enhanced computational performance for 16 bit and 8 bit integer operations. Accordingly, this CPU is suited for a wide range of fields such as big data and AI, not just for the computer simulations at which traditional supercomputers excel.
|Instruction Set Architecture
||Armv8.2-A SVE (512-bit wide SIMD)
|Number of cores
||48 computing cores, 4 assistant cores
||7 nm FinFET
|Number of Transistors
||About 8.7 billion transistors
|Peak Performance (TOPS)
||Double precision (64 bit) floating point operations: over 2.7 TOPS (DGEMM execution efficiency over 90%)
Single precision (32 bit) floating point operations: over 5.4 TOPS
Half precision (16 bit) floating point operations/16 bit integer operations: over 10.8 TOPS
8 bit integer operations: over 21.6 TOPS
|Peak Memory Bandwidth
||1024 GB/second (STREAM Triad execution efficiency over 80%)