Intel presented
a technical paper where they showed that application kernels run up to 14 times faster on a NVIDIA
GeForce GTX 280 as compared with an Intel Core i7 960.
The paper, entitled "Debunking the 100X GPU vs. CPU myth:
an evaluation of throughput computing on CPU and GPU" was
presented by Intel at the International Symposium on
Computer Architecture (ISCA) in Saint-Malo, France.
Processing the ever-growing data in a timely manner has
made throughput computing an important aspect for emerging
applications. According to Intel's analysis of a set of
important throughput computing kernels, there is an ample
amount of parallelism in these kernels which makes them
suitable for today's multi-core CPUs and GPUs.
In the past few years there have been many studies claiming
GPUs deliver substantial speedups (between 10X and 1000X)
over multi-core CPUs on these kernels. To understand where
such large performance difference comes from, Intel
performed a performance analysis and find that after
applying optimizations appropriate for both CPUs and GPUs
the performance gap between an Nvidia GTX280 processor and
the Intel Core i7-960 processor narrows to only 2.5x on
average.
In the paper, Intel also discussed optimization techniques
for both CPU and GPU, analyze what architecture features
contributed to performance differences between the two
architectures, and recommend a set of architectural
features which provide improvement in architectural
efficiency for throughput kernels.
Commenting on the Intel's paper, Andy Keane
Nvidia's General Manager GPU Computing wrote at the
company' blog:
"It?s a rare day in the world of technology when a company
you compete with stands up at an important conference and
declares that your technology is *only* up to 14 times
faster than theirs. In fact in all the 26 years I?ve been
in this industry, I can?t recall another time I?ve seen a
company promote competitive benchmarks that are an order of
magnitude slower."
Keane said Intel used Nvidia's previous generation of GPU,
the Nvidia GTX280 processor for the study and that
the codes that were run on the GTX 280 were run right
out-of-the-box, without any optimization. In fact, it?s
actually unclear from the technical paper what codes were
run and how they were compared between the GPU and CPU.
However, Keane admitted that "the 100x GPU vs CPU Myth"
claim is true.
"Not *all* applications can see this kind of speed up, some
just have to make do with an order of magnitude performance
increase," he said. "But, 100X speed ups, and beyond, have
been seen by hundreds of developers," he added, giving
exanples developers that have achieved speed ups of more
than 100x in their applications.
"The real myth here is that multi-core CPUs are easy for
any developer to use and see performance improvements,
Nvidia's representative said.
"Undergraduate students learning parallel programming at
M.I.T. disputed this when they looked at the performance
increase they could get from different processor types and
compared this with the amount of time they needed to spend
in re-writing their code. According to them, for the same
investment of time as coding for a CPU, they could get more
than 35x the performance from a GPU. Despite substantial
investments in parallel computing tools and libraries,
efficient multi-core optimization remains in the realm of
experts like those Intel recruited for its analysis. In
contrast, the CUDA parallel computing architecture from
NVIDIA is a little over 3 years old and already hundreds of
consumer, professional and scientific applications are
seeing speedups ranging from 10 to 100x using NVIDIA GPUs."
Keane added that industry experts and the development
community are voting by porting their applications to GPUs.
Interestingly enough, Nvidia's Chief Scientist Bill Dally
received the 2010 Eckert-Mauchly Award for his pioneering
work in architecture for parallel computing at the same
event.