In a keynote address opening the Hot Chips symposium, AMD CTO Mark Papermaster unveiled the first details about the Steamroller x86 CPU core.
Papermaster delivered a vision for the coming "Surround Computing Era," and unveiled new processor architecture details which will enable technologies and design methodologies driving the next era in computing.
"Surround Computing imagines a world without keyboards or mice, where natural user interfaces based on voice and facial recognition redefine the PC experience, and where the cloud and clients collaborate to synthesize exabytes of image and natural language data. The ultimate goal is devices that deliver intelligent, relevant, contextual insight and value that improves consumers' everyday life in real time through a variety of futuristic applications. AMD is leading the quest for devices that understand and anticipate users' needs, are driven by natural user interfaces, and that disappear seamlessly into the background," said Papermaster during his opening remarks.
Papermaster explained that the Surround Computing Era will rely on robust "plug-and-play" IP portfolios including central processing units (CPUs), graphics processing units (GPUs), fixed function logic, and interconnect fabric. He also unveiled key details of AMD's upcoming CPU architecture, codenamed "Steamroller," while underscoring the benefits of the industry-standard Heterogeneous System Architecture (HSA) that enables software developers to easily assign scalar- and parallel-compute workloads to the most appropriate compute units, and therefore optimize power.
Steamroller is the third instantiation of AMD's Bulldozer architecture. It takes fundamentals from the Bulldozer/Piledriver architectures and offers a set of improvements on top of them.
The new chip addresses
an issue with the front end of Bulldozer and Piledriver - the shared fetch and decode hardware.
According to AMD, Steamroller is duplicating the decode hardware in each module, so now each core has its own 4-wide instruction decoder, and both decoders can operate in parallel rather than alternating every other cycle. However, the obvious tradeoff is
In addition, AMD has made improvements in integer performance, the shared L1 instruction cache along with the L1 to L2 interface.
Finally on the caching front, Steamroller introduces a dynamically resizable L2 cache and has reduced L2/L3 cache latencies.
It seems that while the Piledriver focused more on improving power efficiency, Steamroller will make a bigger impact on performance, as the core will provide a 30 percent improvement in operations per cycle.
The architecture is slated to debut in 2013 on GlobalFoundries' 28nm bulk process.
Refarding HSA, AMD expects to announce soon new members of the Heterogeneous Systems Architecture group it launched earlier this year. HSA could finish "within months" the first draft of an applications programming interface for enabling merged graphics, x86 and other cores in SoCs.
"We are investing and trying to bring the industry with us to bring apps" to AMD and other SoCs, said Mark Papermaster. "Without a common API and a path from high level languages [to hardware] you will not see broad adoption and new apps," he told the annual gathering of several hundred processor designers.
"It?s not a pure speeds-and-feeds race--it hasn't been for several years--it's a solutionS problem," Patermaster said. "It?s about accelerating the apps stack," he said.
The API will help AMD deliver SoCs that support fast switching and a common memory pool among graphics and x86 cores.
Papermaster declined to comment on whether AMD plans to make any ARM-based SoCs for smartphones, servers or embedded systems. ARM is a part of HSA, and AMD has said it will put an ARM Cortex A5 core in an SoC next year to enable security based on ARM?s Trustzone technology.
With the HSA group, AMD and ARM have formed an open alliance to attack their mutual competitor, Intel. Nvidia also is a rival to AMD, thus not a likely HSA member even though it is a key customer for ARM.