Monday, December 22, 2014
Search
  
Submit your own News for
inclusion in our Site.
Click here...
Breaking News
Samsung Introduces SE790C Curved Monitor
Chinese Motion-sensing VR Glasses Coming On Kickstarter
Kodak Returns To CES With Consumer Product Line
North Korea Suggests Joint Inverstigation With U.S. Over Sony Hacking
T-Mobile to Pay $90 Million To Settle Case With FCC
New Trojan Targetted Banks Wordlwide
FBI Confirms North Korea Was Behind Sony Hack
Apple Responds To BBC's Allegations Over Working Conditions In Chinese Factory
Active Discussions
Digital Audio Extraction and Plextools
Will there be any trade in scheme for the coming PSP Go?
Hello, Glad to be Aboard!!!
Best optical drive for ripping CD's? My LG 4163B is mediocre.
Hi All!
cdrw trouble
CDR for car Sat Nav
DVD/DL for Optiarc 7191S at 8X
 Home > News > PC Parts > AMD Det...
Last 7 Days News : SU MO TU WE TH FR SA All News

Thursday, October 28, 2010
AMD Details High Floating Point Capabilities Of Upcoming Bulldozer Chips


One of the most interesting features planned for AMD's next generation core architecture, which features the new "Bulldozer" core, is something called the "Flex FP," which promises to deliver tremendous floating point capabilities for technical and financial applications.

For those of you not familiar with floating point math, this is the high level stuff, not 1+1 integer math that most applications use. In computing, floating point describes a system for representing numbers that would be too large or too small to be represented as integers. Numbers are in general represented approximately to a fixed number of significant digits and scaled using an exponent. AMD claims that its "Flex FP" floating point unit could offer technical applications and financial applications that rely on heavy-duty use of floating point math huge increases in performance over the existing architectures, as well as far more flexibility.

Flex FP is a single floating point unit that is shared between two integer cores in a module (so an AMD 16-core "Interlagos" would have 8 Flex FP units). Each Flex FP has its own scheduler; it does not rely on the integer scheduler to schedule FP commands, nor does it take integer resources to schedule 256-bit executions. This helps to ensure that the FP unit stays full as floating point commands occur. AMD says that Intel and other competitors? architectures have had single scheduler for both integer and floating point, which means that both integer and floating point commands are issued by a single shared scheduler vs. having dedicated schedulers for both integer and floating point executions.

There will be some instruction set extensions that include SSSE3, SSE 4.1 and 4.2, AVX, AES, FMA4, XOP, PCLMULQDQ and others.

One of these new instruction set extensions, AVX, can handle 256-bit FP executions. However, there is no such thing as a 256-bit command. Single precision commands are 32-bit and double precision are 64-bit. With today?s standard 128-bit FPUs, you execute four single precision commands or two double precision commands in parallel per cycle. With AVX you can double that, executing eight 32-bit commands or four 64-bit commands per cycle ? but only if your application supports AVX. If it doesn?t support AVX, then that flashy new 256-bit FPU only executes in 128-bit mode (half the throughput). That is, unless you have a Flex FP.

In today?s typical data center workloads, the bulk of the processing is integer and a smaller portion is floating point. So, in most cases you don?t want one massive 256-bit floating point unit per core consuming all of that die space and all of that power just to sit around watching the integer cores do all of the heavy lifting. By sharing one 256-bit floating point unit per every 2 cores, AMD can keep die size and power consumption down, helping hold down both the acquisition cost and long-term management costs.

The Flex FP unit is built on two 128-bit FMAC units. The FMAC building blocks are quite robust on their own. Each FMAC can do an FMAC, FADD or a FMUL per cycle.

"When you compare that competitive solutions that can only do an FADD on their single FADD pipe or an FMUL on their single FMUL pipe, you start to see the power of the Flex FP ? whether 128-bit or 256-bit, there is flexibility for your technical applications. With FMAC, the multiplication or addition commands don?t start to stack up like a standard FMUL or FADD; there is flexibility to handle either math on either unit," said John Fruehe, the director of product marketing for server/workstation products at AMD.

Here are some additional benefits:

* Non-destructive DEST via FMA4 support (which helps reduce register pressure)
* Higher accuracy (via elimination of intermediate round step)
* Can accommodate FMUL OR FADD ops (if an app is FADD limited, then both FMACs can do FADDs, etc), which is a huge benefit

The new AES instructions allow hardware to accelerate the large base of applications that use this type of standard encryption (FIPS 197). The "Bulldozer" Flex FP is able to execute these instructions, which operate on 16 Bytes at a time, at a rate of 1 per cycle, which provides 2X more bandwidth than current offerings, AMD added.

By having a shared Flex FP the power budget for the processor is held down. This allows AMD to add more integer cores into the same power budget. By sharing FP resources (that are often idle in any given cycle) AMD can add more integer execution resources (which are more often busy with commands waiting in line). In fact, the Flex FP is designed to reduce its active idle power consumption to a mere 2% of its peak power consumption.

"The Flex FP gives you the best of both worlds: performance where you need it yet smart enough to save power when you don?t need it," Mr. Fruehe said.

The beauty of the Flex FP is that it is a single 256-bit FPU that is shared by two integer cores. With each cycle, either core can operate on 256 bits of parallel data via two 128-bit instructions or one 256-bit instruction, OR each of the integer cores can execute 128-bit commands simultaneously. This is not something hard coded in the BIOS or in the application; it can change with each processor cycle to meet the needs at that moment. When you consider that most of the time servers are executing integer commands, this means that if a set of FP commands need to be dispatched, there is probably a high likelihood that only one core needs to do this, so it has all 256-bit to schedule.

Floating point operations typically have longer latencies so their utilization is typically much lower; two threads are able to easily interleave with minimal performance impact. So the idea of sharing doesn?t necessarily present a dramatic trade-off because of the types of operations being handled.

Also, each of AMD's pipes can handle SSE or AVX as well as FMUL, FADD, or FMAC providing the greatest flexibility for any given application. Existing apps will be able to take full advantage of AMD's hardware with potential for improvement by leveraging the new ISAs, the company said.

"Obviously, there are benefits of recompiled code that will support the new AVX instructions. But, if you think that you will have some older 128-bit FP code hanging around (and let?s face it, you will), then don?t you think having a flexible floating point solution is a more flexible choice for your applications? For applications to support the new 256-bit AVX capabilities they will need to be recompiled; this takes time and testing, so I wouldn?t expect to see rapid movement to AVX until well after platforms are available on the streets. That means in the meantime, as we all work through this transition, having flexibility is a good thing. Which is why we designed the Flex FP the way that we have," Mr. Fruehe added.


Previous
Next
Intel and AMD Retain Market Share Amid Fast Growth        All News        Google Releases Local Search Results
Intel and AMD Retain Market Share Amid Fast Growth     PC Parts News      WD'S HD Media Center Streams Movies And Personal Content

Get RSS feed Easy Print E-Mail this Message

Related News
New Feature-stuffed Catalyst Omega Drives Released
AMD Adds The Mobile "Carrizo" Family of APUs to Its Roadmap
AMD Reveals Civilization: Beyond Earth Game Bundle, Overclocked Radeon R9 290X
AMD's Radeon R9 290X Graphics Card Now Available For Just $300
AMD Sales May Miss Estimates, New CEO Announces Restructuring
AMD Appoints Lisa Su as President and Chief Executive Officer
AMD To release Carrizo Notebook APUs Next Year
AMD Demonstrates Network Function Virtualization Solution on 64-Bit Embedded R-Series SoC
AMD To Showcase ARM Cortex-A57-Based Hadoop on Opteron Processors
AMD Moves Closer To The Introduction Of Project FreeSync Monitors
AMD Introduces DirectGMA on AMDs FirePro GPUs
AMD Launches AMD Radeon R9 285 Graphics, "Never Settle: Space Edition" Game Bundle

Most Popular News
 
Home | News | All News | Reviews | Articles | Guides | Download | Expert Area | Forum | Site Info
Site best viewed at 1024x768+ - CDRINFO.COM 1998-2014 - All rights reserved -
Privacy policy - Contact Us .