Wednesday, June 20, 2018
Submit your own News for
inclusion in our Site.
Click here...
Breaking News
Verizon, Sprint, AT&T and T-Mobile To Stop Sharing Cell Phone Location Data
Hackers Breached Satellite, Defense Companies
Benchmarks With the 32-core AMD Threadripper Appear Online
Facebook Offers More Tools To Video Creators
Western Digital Adds 12TB Western Digital Purple Drive to Surveillance Portfolio
Toshiba Delivers RM5 vSAS Series SSDs Targeting SATA Applications
Noctua launches CPU Coolers for LGA3647 Intel Xeon Platforms
Google to Bring Support for Android Messages to Desktop Browser
Active Discussions
Which of these DVD media are the best, most durable?
How to back up a PS2 DL game
Copy a protected DVD?
roxio issues with xp pro
Help make DVDInfoPro better with dvdinfomantis!!!
menu making
Optiarc AD-7260S review
cdrw trouble
 Home > News > PC Parts > AMD Det...
Last 7 Days News : SU MO TU WE TH FR SA All News

Thursday, October 28, 2010
AMD Details High Floating Point Capabilities Of Upcoming Bulldozer Chips

One of the most interesting features planned for AMD's next generation core architecture, which features the new "Bulldozer" core, is something called the "Flex FP," which promises to deliver tremendous floating point capabilities for technical and financial applications.

For those of you not familiar with floating point math, this is the high level stuff, not 1+1 integer math that most applications use. In computing, floating point describes a system for representing numbers that would be too large or too small to be represented as integers. Numbers are in general represented approximately to a fixed number of significant digits and scaled using an exponent. AMD claims that its "Flex FP" floating point unit could offer technical applications and financial applications that rely on heavy-duty use of floating point math huge increases in performance over the existing architectures, as well as far more flexibility.

Flex FP is a single floating point unit that is shared between two integer cores in a module (so an AMD 16-core "Interlagos" would have 8 Flex FP units). Each Flex FP has its own scheduler; it does not rely on the integer scheduler to schedule FP commands, nor does it take integer resources to schedule 256-bit executions. This helps to ensure that the FP unit stays full as floating point commands occur. AMD says that Intel and other competitors? architectures have had single scheduler for both integer and floating point, which means that both integer and floating point commands are issued by a single shared scheduler vs. having dedicated schedulers for both integer and floating point executions.

There will be some instruction set extensions that include SSSE3, SSE 4.1 and 4.2, AVX, AES, FMA4, XOP, PCLMULQDQ and others.

One of these new instruction set extensions, AVX, can handle 256-bit FP executions. However, there is no such thing as a 256-bit command. Single precision commands are 32-bit and double precision are 64-bit. With today?s standard 128-bit FPUs, you execute four single precision commands or two double precision commands in parallel per cycle. With AVX you can double that, executing eight 32-bit commands or four 64-bit commands per cycle ? but only if your application supports AVX. If it doesn?t support AVX, then that flashy new 256-bit FPU only executes in 128-bit mode (half the throughput). That is, unless you have a Flex FP.

In today?s typical data center workloads, the bulk of the processing is integer and a smaller portion is floating point. So, in most cases you don?t want one massive 256-bit floating point unit per core consuming all of that die space and all of that power just to sit around watching the integer cores do all of the heavy lifting. By sharing one 256-bit floating point unit per every 2 cores, AMD can keep die size and power consumption down, helping hold down both the acquisition cost and long-term management costs.

The Flex FP unit is built on two 128-bit FMAC units. The FMAC building blocks are quite robust on their own. Each FMAC can do an FMAC, FADD or a FMUL per cycle.

"When you compare that competitive solutions that can only do an FADD on their single FADD pipe or an FMUL on their single FMUL pipe, you start to see the power of the Flex FP ? whether 128-bit or 256-bit, there is flexibility for your technical applications. With FMAC, the multiplication or addition commands don?t start to stack up like a standard FMUL or FADD; there is flexibility to handle either math on either unit," said John Fruehe, the director of product marketing for server/workstation products at AMD.

Here are some additional benefits:

* Non-destructive DEST via FMA4 support (which helps reduce register pressure)
* Higher accuracy (via elimination of intermediate round step)
* Can accommodate FMUL OR FADD ops (if an app is FADD limited, then both FMACs can do FADDs, etc), which is a huge benefit

The new AES instructions allow hardware to accelerate the large base of applications that use this type of standard encryption (FIPS 197). The "Bulldozer" Flex FP is able to execute these instructions, which operate on 16 Bytes at a time, at a rate of 1 per cycle, which provides 2X more bandwidth than current offerings, AMD added.

By having a shared Flex FP the power budget for the processor is held down. This allows AMD to add more integer cores into the same power budget. By sharing FP resources (that are often idle in any given cycle) AMD can add more integer execution resources (which are more often busy with commands waiting in line). In fact, the Flex FP is designed to reduce its active idle power consumption to a mere 2% of its peak power consumption.

"The Flex FP gives you the best of both worlds: performance where you need it yet smart enough to save power when you don?t need it," Mr. Fruehe said.

The beauty of the Flex FP is that it is a single 256-bit FPU that is shared by two integer cores. With each cycle, either core can operate on 256 bits of parallel data via two 128-bit instructions or one 256-bit instruction, OR each of the integer cores can execute 128-bit commands simultaneously. This is not something hard coded in the BIOS or in the application; it can change with each processor cycle to meet the needs at that moment. When you consider that most of the time servers are executing integer commands, this means that if a set of FP commands need to be dispatched, there is probably a high likelihood that only one core needs to do this, so it has all 256-bit to schedule.

Floating point operations typically have longer latencies so their utilization is typically much lower; two threads are able to easily interleave with minimal performance impact. So the idea of sharing doesn?t necessarily present a dramatic trade-off because of the types of operations being handled.

Also, each of AMD's pipes can handle SSE or AVX as well as FMUL, FADD, or FMAC providing the greatest flexibility for any given application. Existing apps will be able to take full advantage of AMD's hardware with potential for improvement by leveraging the new ISAs, the company said.

"Obviously, there are benefits of recompiled code that will support the new AVX instructions. But, if you think that you will have some older 128-bit FP code hanging around (and let?s face it, you will), then don?t you think having a flexible floating point solution is a more flexible choice for your applications? For applications to support the new 256-bit AVX capabilities they will need to be recompiled; this takes time and testing, so I wouldn?t expect to see rapid movement to AVX until well after platforms are available on the streets. That means in the meantime, as we all work through this transition, having flexibility is a good thing. Which is why we designed the Flex FP the way that we have," Mr. Fruehe added.

Intel and AMD Retain Market Share Amid Fast Growth        All News        Google Releases Local Search Results
Intel and AMD Retain Market Share Amid Fast Growth     PC Parts News      WD'S HD Media Center Streams Movies And Personal Content

Get RSS feed Easy Print E-Mail this Message

Related News
Benchmarks With the 32-core AMD Threadripper Appear Online
AMD Updates 3rd Generation Embedded G-Series SoC J Family
Computex: AMD to Launch 32-core Threadripper 2 this Year, Vega 7nm in the Works
AMD Ryzen 3 2300X And Ryzen 5 2500X Processor Details Leaked
AMD Increased GPU Market Share in Q1
Enmotus FuzeDrive Performance Benchmarks
New AMD Ryzen PRO Processors featured in Dell Latitude, HP Elite, and Lenovo Think Designs
AMD to start Sampling 7nm Instinct GPUs This Year
AMD Zen CPU Architect Jim Keller Joins Intel
AMD's Revenue increased 40 Percent year-over-year
2nd Generation AMD Ryzen Desktop Processors Arrive to Offer Great Value
2nd Generation AMD Ryzen Processors Available April 19

Most Popular News
Home | News | All News | Reviews | Articles | Guides | Download | Expert Area | Forum | Site Info
Site best viewed at 1024x768+ - CDRINFO.COM 1998-2018 - All rights reserved -
Privacy policy - Contact Us .