"We expect to go to production later next year, the design is progressing very rapidly," Huang said. "There are hundreds of engineers working on it."
The Kepler processor will be three to four times faster than Nvidia's current Fermi chip generation, Huang said. Later in 2013, Nvidia will make an even greater leap, Huang said, predicting the arrival of a chip called Maxwell will be ten to 12 times the power of Fermi.
Nvidia faces pressure as Intel early next year launches its Sandy Bridge chips, which will combine a traditional core processor with a graphics processor.
Nvidia is moving into the fast-growing mobile business, combining low-powered processors designed by ARM with its own graphics processors under the Tegra brand name for telephones and tablets.
Huang described Tegra 2, which is coming later this year, as "phenomenal".
AMD s also plans to release a microprocessor with integrated graphics.
GPU Acceleration in Large-scale CFD on the Tsubame Supercomputer
At the same event, Professor Takayuki Aoki from the Tokyo Institute of Technology gave a look under the hood of the university?s Tsubame supercomputer, which has been used (among other things) in collaboration with a number of agencies in Japan to provide complex computational fluid dynamics modeling.
The Tsubame is notable because it leverages GPU clusters. The Tsubame 1.0 uses Tesla S1070 in a 680 GPU cluster. With it, scientists have been able to experience speed ups of up to 80x in problems like weather modeling. Coming in December, the next-generation Tsubame 2.0 will use the Tesla 2050 with 4224 GPUs and provide performance of more than 3 PFLOPs.
Performance metrics are an important scorecard with supercomputers. But Professor Aoki explained how performance achieved really depends on the application as well as tuning and optimization ? and gave some insights into how his group has been able to improve results. With full GPU implementation, Tsubame experienced acceleration of 10x to 100x over CPU-only performance, but notes that the numbers depended on the application.
With such intensive computation, any performance increase makes a huge difference ? as do any bottlenecks. Communication issues between GPUs over cluster nodes can create some additional overhead, and Professor Aoki explained an overlapping technique they?ve developed at Tokyo Tech to deal with this.
CUDA Boosts Video Editing with Adobe Premier Pro
Video editing is one of the disciplines making striking use of CUDA technology, as presenters from Adobe made clear during a session at the GPU Technology Conference. Al Mooney, product manager for Adobe?s Premier Pro video editing application, and computer scientist Steve Hoeg showed a room full of video professionals how the parallel processing capabilities of NVIDIA's GPUs are making it possible to enjoy peak performance while tapping the full capabilities of Premier Pro.
Video editors face numerous processing challenges, including huge data streams, hundreds of formats, and increasing pressure to deliver bigger results in shorter timeframes. Despite these challenges, editors across the board want to be able to play back any format or frame rate without conversion, mix multiple formats and frame rates within the same project, apply visual effects without slowing down the project, and deliver to multiple output formats quickly.
This combination of factors is pushing the processing needs of video editors to new levels, and CUDA is helping Premier Pro, which represents the largest commercial CUDA deployment to date, deliver the goods.
As an example of the profound impact CUDA is having, Hoeg noted that GPUs are able to process color correction 75 times faster than the latest Intel Nehalem processors. "CPUs just cannot cope with this," Hoeg said.
Additionally, whereas CPUs have struggled to simultaneously apply video effects while decoding source files, Hoeg said that offloading the effects functions to GPUs enables CPUs to handle the decoding easily. What?s more, CUDA makes it possible for users of Premier Pro to work straight from those source files rather than having to convert them to a pre-determined format.
"CUDA has allowed us to do a lot of things previously not possible," said Hoeg.
To nail that point home, Mooney demonstrated how Premier Pro handles playback of five simultaneous video streams. First, he shut off the GPU acceleration and showed how the software would freeze and blip during the playback of a superimposed figure standing before four videos running in separate quadrants in the background. Then, once he turned the GPU acceleration back on, playback proceeded seamlessly.