NVIDIA CUDA Technology

by Parm Mann on 22 October 2008, 00:00

Tags: NVIDIA (NASDAQ:NVDA)

Quick Link: HEXUS.net/qarso

Add to My Vault: x

Introduction

A graphics card is a hardware component in most PCs that enables you to play games with realistic-looking effects.

Two companies currently design and manufacture graphics cores: NVIDIA and ATI, and both companies' add-in board partners manufacture the cards themselves. NVIDIA and ATI's GPU (Graphics Processing Unit) architectures are broadly similar and both have complete cards ranging from £20 through to £350. Generally speaking, the more you spend on a card, the better the performance.

It's no surprise that both companies' products, at a similar price-point, also match up with respect to performance.

Gaming code is run through what are termed APIs (Application Programming Interface) and the two most common are DirectX and OpenGL. Writing code via these regimented protocols means that it will run on various operating systems, including, of course, Microsoft Windows.

However, recent GPU designs have sought to expand the usefulness of graphics cards beyond regular gaming. The reason for doing so lies with the massively parallel computational ability of today's mid-range GPUs, which, in turn, can lead to significantly faster processing.

Measured in GFLOPs, NVIDIA and ATI's highest-performing GPUs have the basic building blocks to execute program algorithms much quicker than a conventional processor (CPU) - up to 20x in certain cases, and the burgeoning industry that's behind efforts to use GPUs for compute-intensive tasks is known as GPGPU (General-Purpose Computations on a GPU).

But one cannot port existing applications that run on x86-class CPUs - Intel and AMD's, for example - to GPUs without using special coding techniques that harness the fantastic parallelism inherent in the fastest cards. OpenGL and DirectX are optimised for gaming and not general programming, remember.

NVIDIA's CUDA

First released in February 2007, NVIDIA engineered a set of tools that allowed application developers to write programs to run on NVIDIA's hardware. The set of tools included a compiler that uses a derivation of the 'C' programming language.

Using basic 'C' with some additions, developers can now code algorithms that gain practically unfettered access to the GPUs' immense memory bandwidth and computational power.

The tools are collectively referred to as CUDA (Complete Unified Device Architecture) and it has now been rolled into general graphics card drivers for a wide range of GPUs. The toolkit can be downloaded here.

At the time of writing, NVIDIA supported CUDA on GeForce 8, GeForce 9, GeForce GTX 200, Mobile GeForce 8/9, Quadro, Quadro Mobile, and Tesla GPUs - which encompasses most of the shipping range.

CUDA works best when the application workset is able to be run, concurrently, on the many processing cores that make up a present-generation GPU - and this is where we can see a 10x improvement over CPU-based processing.

Should the application be more serial in nature, the benefit of CUDA is diminished, but it still may be quicker to execute than on a CPU.

Practical applications

Working on Windows XP, Vista, Mac OS and Linux, CUDA-written applications have begun to surface this year.

Understanding the processing parallelism of GPUs means that certain types of applications scale well with the cores, and those which are media- or graphics-related tend to be particularly partial to running on a many-processor engine such as a GPU.

One of the first programs to be written via CUDA and released to the public is the BadaBOOM media converter. Testing has shown that converting from the MPEG-2 DVD format to quality-optimised H.264 is around is 8x faster on a GeForce GTX 280 than on an Intel quad-core CPU using a similar encoder.

Another compute-intensive task to take advantage of CUDA is Folding@Home, a distributed application that analysis the way in which proteins fold. Knowing this will give the medical community far greater insight into how advanced biology works. Folding on a class-leading GeForce GTX 280 is up to 20x faster than a quad-core CPU and up to 8x quicker than the Cell processor in the PlayStation 3.

Recently, NVIDIA announced a collaboration with Adobe on its Creative Suite 4 range of programs. Photoshop CS4 ships with some CUDA/OpenGL-written speed-up enhancements that are activated if a compatible NVIDIA GPU is detected.

The same is true of After Effects CS4 and Premier Pro CS4, with compute-intensive work farmed out to run on the GPU(s) in the system. NVIDIA also reckons that an Elemental RapiHD CUDA-written plug-in for Premiere Pro CS4 will be released in November 2008 - promising an up-to 7x speed-up when compared to a CPU's performance.

The future, and downsides

NVIDIA is apportioning significant resources behind ensuring that CUDA is successful. We're likely to see a greater number of multimedia-related applications released that take advantage of the speed-up availed by using a GPU over a CPU. Indeed, NVIDIA is keen to stress this very point with its Balanced PC campaign.

One obvious downside to CUDA's chances of continued success is that it is currently tied into select NVIDIA GPUs; it won't work with rival ATI's cards without extensive modification. NVIDIA will claim that it's a totally open platform where anyone with sufficient knowledge can code algorithms to run on hardware, but, really, it is limited by design. Compare this with the x86 instruction-set for CPUs and it's clear that the latter will always have a larger install base.

CUDA is just one method of running GPGPU programs. ATI disbanded its approach, titled Close To Metal, in favour of supporting simpler higher-level languages such as Brook/Brook+. Microsoft, too, is getting in on the GPGPU act with the release of its Compute Shader in the DirectX11 update. It can be thought of as the GPGPU equivalent of gaming's DX1x, but lifts some of the gaming-oriented constraints in favour of much broader support.

With the backing of an industry giant and owner of key operating systems, Compute Shader will be open to all DirectX11-class of hardware, irrespective of GPU designer, and we can foresee developers dropping CUDA-specific algorithms and supporting Microsoft's GPU-wide initiative instead.

Concluding thoughts

Modern graphics cards are designed to do more than just play the latest games. Specific architecture changes in the last two years - necessitated by a tightening of DX specifications - have meant that GPUs are imbued with incredible parallel processing ability that runs into the TFLOPs range for a single card. Such abundant power can be harnessed by programs whose code can be run, concurrently, over many processing cores. This can translate into a significant speed-up of many applications.

NVIDIA's seen the benefit of such a parallel approach and tapped into it by releasing a compiler and toolkit, known as CUDA, that lets application developers code in a derivation of the popular 'C' language. The manifestation of such an approach has lead to programs which execute many times faster than on even a quad-core CPU - taking advantage of the 240 cores on a GeForce GTX 280, for example.

CUDA, therefore, extends the usefulness of a compatible GPU, and we're likely to see more multimedia-related CUDA-written apps in the near future. Microsoft, too, is taking the GPGPU environment seriously with the release of its Compute Shader as part of DX11's roll out, and only time will tell which most developers opt for.


Sponsered by SCAN