According to ARM, the Mali-G71 graphics processor unit (GPU) enables a 50 percent increase in graphics performance, a 20 percent increase in power-efficiency and 40 percent more performance per mm2.
The Mali-G71 scales up to 32 shader cores, twice as many as the previous generation premium IP GPU ? the Mali-T880. The uplift means the Mali-G71 surpasses the performance of many discrete GPUs found in today's mid-range laptops. The product is also fully coherent, helping to simplify software development and efficiency. It is suited to power VR and AR experiences on mobile devices and silicon providers including HiSilicon, MediaTek and Samsung Electronics have already taken licenses.
Bifrost, the third-generation ARM GPU architecture, is the foundation of the Mali-G71. The architecture is optimized for Vulkan Vulkan 1.0 and OpenCL 2.0 Full Profile - it even has support for Fine Grained buffers and shared virtual memory, enabled through full hardware coherency support.
The Mali Bifrost architecture includes claused shaders, to allow you to group sets of instructions together into defined blocks that will run to completion atomically and uninterrupted. This reduces the pressure on the register file, decreasing the amount of power it consumes and also contributes to area reduction by simplifying the control logic in the execution units.
The Bifrost architecture also brings Quad based vectorization. Midgard GPUs used SIMD vectorization which executed one thread at a time in the pipeline stage and was very dependent on the shader code executing vector instructions. Quad vectorization allows four threads to be executed together, sharing control logic. This makes it much easier to fill the execution units, achieving close to 100% utilization and better fits recent advances in how developers are writing shader code.
The previous generation of High performance mobile GPUs were scalable from 1 to 16 cores. To reflect the growing performance requirements of mobile devices, Mali-G71 is scalable from 1 to 32 cores. The scalability of Mali-G71 means superior graphics performance is available across a wider than ever range of devices from DTVs through high end smartphones right up to cutting edge VR headsets, either mobile-based or standalone.
The Mali family of GPUs also has VR optimization features. Front buffer rendering allows developers to bypass the usual off screen buffers to render directly to the front buffer, saving time and reducing latency. Mali also supports the 'multiview' API extensions that allow the application to submit the draw commands for a frame to the driver once and have the driver instantiate the necessary work for each eye. This reduces the CPU time required in both the application and driver. On Midgard and Bifrost based Mali GPUs we further optimize the vertex processing work, running the parts of the vertex shader that do not depend upon the eye once and sharing the results between each eye.
The recently released Mali-DP650 display processor already has the capability to handle 4k content and the Mali-G71 allows this content to be streamed to your TV without losing any of the quality.
Mali-G71 was also designed and optimized as part of a complete system, working better together as part of the Mali Multimedia Suite with CCI-550 providing full coherency for CPU and GPU.
The new ARM Cortex-A73
At under 0.65mm2 per core (on a 10nm FinFET process technology) the Cortex-A73 is the smallest and most efficient 'big' ARMv8-A core. It delivers the highest performance in the mobile power envelope, at frequencies up to 2.8GHz. Its mobile microarchitecture enables a 30 percent uplift in sustained performance and power efficiency over the Cortex-A72.
Size and efficiency improvements enhance the ability of silicon providers to use the Cortex-A73 in ARM big.LITTLE configurations. ARM says that ten sillicon providers partners have licensed the Cortex-A73 so far, including HiSilicon, Marvell and Mediatek.
Starting with the basics, the Cortex-A73 supports the full ARMv8-A architecture. It includes ARM TrustZone technology, NEON, virtualization and cryptography.
The Cortex-A73 includes a 128-bit AMBA 4 ACE interface enabling integration in ARM big.LITTLE systems, either with the efficient Cortex-A53 in premium designs or with ARM's latest ultra-efficient Cortex-A35 processor in mid-range and more cost constrained designs.
The Cortex-A73 micro-architecture includes several performance optimizations. It supports a 64kB instruction cache, branch prediction based on advanced algorithms, and high-performance instruction prefetching. The main performance improvements are actually implemented in the data memory system. It uses advanced L1 and L2 data prefetchers, with complex pattern detection. ARM has also optimized the store buffer for continuous write streams and increased the data cache to 64kB without any timing impacts.
These enhancements translate into a performance uplift of up to 10% in mobile use cases compared to Cortex-A72 at iso-frequency. Moreover the Cortex-A73 consistently beats Cortex-A72 in all memory workloads by at least 15% to increase the performance across multiple applications, operating system operations or complex compute execution as NEON processing.
The Cortex-A73, combined with Cortex-A53, will power the next-generation of premium smartphones, typically in an octa-core configuration. For mid-range smarphones, in a hexa-core big.LITTLE configuration, a dual-core Cortex-A73 and quad-core Cortex-A53 or Cortex-A35 enables significant performance uplift in the same or less area than an octa-core Cortex-A53. In comparison to an octa-core Cortex-A53, the Cortex-A73 hexa-core delivers 30% more multi-core performance and twice the single-thread peak performance, thanks to a reduced response time for applications such as web browsing and interface scrolling.
Later this year and in 2017, ARM's partners will integrate the Cortex-A73 into premium smartphones, tablets, clamshells, DTVs, and a wide range of consumer devices.