NVIDIA today announced the latest version of the NVIDIA
CUDA Toolkit for developing parallel applications using
Nvidia said that the new NVIDIA CUDA 4.0 Toolkit was
designed to make parallel programming easier, and enable
more developers to port their applications to GPUs.
These are the three main features of the new toolkit:
- NVIDIA GPUDirect 2.0 Technology -- Offers support for
peer-to-peer communication among GPUs within a single
server or workstation. This enables faster multi-GPU
programming and application performance.
- Unified Virtual Addressing (UVA) -- Provides a single
merged-memory address space for the main system memory and
the GPU memories, enabling quicker and easier parallel
- Thrust C++ Template Performance Primitives Libraries --
Provides a collection of open source C++ parallel
algorithms and data structures that ease programming for
C++ developers. With Thrust, routines such as parallel
sorting are 5X to 100X faster than with Standard Template
Library (STL) and Threading Building Blocks (TBB).
The CUDA 4.0 architecture release also includes a number of
other key features and capabilities, including:
- MPI Integration with CUDA Applications -- Modified MPI
implementations automatically move data from and to the GPU
memory over Infiniband when an application does an MPI send
or receive call.
- Multi-thread Sharing of GPUs -- Multiple CPU host threads
can share contexts on a single GPU, making it easier to
share a single GPU by multi-threaded applications.
- Multi-GPU Sharing by Single CPU Thread -- A single CPU
host thread can access all GPUs in a system. Developers can
easily coordinate work across multiple GPUs for tasks such
as "halo" exchange in applications.
- New NPP Image and Computer Vision Library -- A set of
image transformation operations that enable rapid
development of imaging and computer vision applications.
- Auto performance analysis in the Visual Profiler
- New features in cuda-gdb and added support for MacOS
- Added support for C++ features like new/delete and
- New GPU binary disassembler
A release candidate of CUDA Toolkit 4.0 will be available
free of charge beginning March 4, 2011, by enrolling in the
CUDA Registered Developer Program at: