Review: AMD Radeon HD 3870: the new midrange DX10 king?

by Tarinder Sandhu on 15 November 2007, 05:00

Tags: HIS Radeon HD 3870, AMD (NYSE:AMD)

Quick Link: HEXUS.net/qakfp

Add to My Vault: x

Tell me all about Radeon HD 3870, baby

RATI Radeon HD 3870 series

Today, AMD is launching two new graphics processors - the Radeon HD 3870 512 and its little brother, the Radeon HD 3850 256. Notice the lack of super-duper suffixes? Well, for once, we're able to report that the model-numbering system makes genuine sense.

So let's decipher the top-rung card, the Radeon 3(generation) 8 (family) 70 (variant). This is intrinsically faster than the Radeon 3850; it's as simple as that. There are no XTs, Pros, GTs, or Ultra suffixes in sight.

Radeon HD 3870 - a frugal Radeon HD 2900 XT, then?

We know that Radeon HD 3870 is the faster of the two new cards but what, exactly, is it? Let us explain by the help of the trusty block diagram (click on the image below to see a larger version).



Thinking of the past gives us a glimpse in to the present. The Radeon HD 3870 GPU is based on the Radeon HD 2900 XT, albeit with a few extra bells and whistles that we'll explain in a moment. Please feel free to head back here to our in-depth look at the architectural underpinnings of the 2900 XT, to save excessive reproduction of known facts.

The Radeon 3870 packs in a unified shader core - meaning that its scalar-based stream-processing units can be allocated to work on either vertex, pixel or geometry data - with the same '320' units, arranged in four blocks (SIMDs) containing 16 processors.

As each processor contains five superscalar arithmetic logic units (ALUs), AMD sums it all up - (4x16) x5 - to arrive at the wonderfully-large 320 quotient. The tesselator engine is carried over, naturally.

Continuing the theme of harvesting from R600, Radeon HD 3870 - code-named RV670 - is equipped with an identical 16 texturing units and 16 ROPs, and all operate at the same speed as the shaders; no split-clocking here. The Radeon 3870's 16 FP32 texture-filter units can filter at high-precision (FP32) at half speed - that's eight pixels per clock cycle (8ppc) - or run FP16 at full speed, so 16ppc.

The ROPs are, as far as we can ascertain, identical, too, outputting a total of 16ppc, and feature the same MSAA resolve for some funky custom-filter anti-aliasing (CFAA), right up to 24x. Again, head back to our in-depth look if you're befuddled by what's going on here.

Display options

The Radeon HD 3800 family retains the audio-routing ability first seen on the previous generation's best. AMD's kept the GPU-incorporated audio controller such that your motherboard's audio device, be it onboard or discrete, can ultimately be routed via PCIe into the GPU and pushed out together with video. Microsoft's UAA driver does the routing for Vista and AMD has its own equivalent downloadable driver for Windows XP.

Again, like the HD 2000-series, the 3800 family will also feature an optional adapter that converts a regular DVI output into full HDMI (sound-routing permitting). As standard, cards will feature a couple of dual-link HDCP-compliant ports through which video and audio will pass together, rather than being routed from different ports.

So what's different?

Memory controller

The RV670 code-name implies that the Radeon HD 3870 is a value-oriented part yet it contains much of what makes the more-expensive Radeon HD 2900 XT tick along, so what's actually new?

Running from an architecture viewpoint and then thinking about associated benefits, the memory controller has been pared down and tuned for a mid-range part. Specifically, the 3870's controller now uses a fully-distributed internal 512-bit memory-bus that hooks up with card-mounted memory through a 256-bit interface.

Now, the Radeon HD 2900 XT uses an internal 1024-bit bus and an external 512-bit interface, so the new kid on the block has, on a clock-for-clock basis, lost half the memory bandwidth available to the 2900 XT. However, AMD reckons that the 3870's controller is extremely well-tuned to hide latencies, whereas the 2900 XT's was designed to reach peak efficiency with GDDR5 memory. Whatever the case, the 3870 is bandwidth-deficient when compared to its predecessor.

DX10 vs DX10.1



Although we've barely seen a trickle of DX10 games hit the market, AMD is providing hardware support for DX10.1 (SM4.1) - and that will be shipped with Windows Vista Service Pack 1 some time in H1 2008. DX10.1 is an interim upgrade that tightens the API-spec further, as you can see from the image above.

We could go into greater detail as to how and why having DX10.1 will be great for gaming but - and it's a pragmatic but - developers will have to code specifically for the additional features that it brings.

We've seen a lukewarm uptake of Vista-only DX10, so forcing games development teams to now code for an API upgrade that may not even be usable for six months or more certainly isn't an over-riding reason to consider purchasing the cards even if the features represent a step in the right direction.

PCIe 2.0

Completely new to this design is compliance with PCIe 2.0. That doubles the bandwidth to/from the graphics card and system, raising it to 8GB/s in each direction (5Gbit/s on 16 lanes).

We reckon that the extra bandwidth makes most sense on the Radeon HD 3850 with its 256MiB, where frame-buffer limitations dictate that significant data is pushed on over the PCIe bus to system memory to act as a temporary store.

55nm production

AMD based the Radeon HD 2900 XT on an 80nm process. The Radeon HD 2600 and 2400 variants are produced on a power-saving 65nm process. The Radeon HD 3800 series is, wait for it, produced on a 55nm process.

The advantage of moving down processes is hugely important for reducing power and lowering productions costs. We already know that the Radeon HD 3800 series amalgamates much of 2900 XT and has transistor counts that are reasonably analogous - 666M (3800) vs 700M (2900).

Radeon 3800 series' die-size is approximately 192mm², while the Radeon HD 2900 XT's is 408mm². That shrinkage directly affects the number of GPUs AMD can have manufactured on a regular wafer, meaning that the new GPUs are considerably cheaper to produce. Additionally, the nature of reducing process nodes leads to a reduction in operating voltage, too. We'll put this to the test later.

UVD is now omnipresent



AMD's UVD (Unified Video Decoder), which provides hardware-assisted support for decoding computationally-expensive high-definition content - primarily VC-1 and H.264 for Blu-ray and HD DVD titles - was conspicuously absent from R600 but AMD insisted that the GPU's power was such that its shaders could take on the workload.

The truth was far simpler, really, as the design was architected and taped-out before the UVD processor could be implemented, as per Radeon HD 2600/2400. UVD is now present and accounted for on both 3800 cards, we note.

PowerPlay on the desktop

Augmenting the lower-power nature of the card is the introduction of ATI's PowerPlay power-management feature. The 3800 series will draw considerably less power than the 2900 XT but their power-profile is further helped by intelligent switching and clock-gating when the GPU isn't being fully thrashed in 3D gaming.

For example, the 2900 XT is reckoned to consume around 200W when placed under heavy gaming. That figure remains the same with, shall we say, light gaming, and only idles down once the card is in 2D mode.

The 3800 series adds a third category, defined as light gaming, where parts of the GPU are gated when not in use. Numbers suggest that light gaming consumes around half the power of heavy gaming, with no loss in performance. But it's unclear exactly what constitutes light-gaming.

CrossFire and CrossFireX

Multi-GPU CrossFire is also the recipient of an upgrade. Now, each 'golden-finger' provides double the intra-card bandwidth, when compared to you-know what. It's an open secret that AMD is launching a new motherboard core-logic with potential support for four mechanical PCIe slots. The Radeon 3800 series can be hooked together in various CrossFire modes, including setups for two, three and four cards (CrossFireX).

The dual-card driver is available immediately and a driver for three and four cards is mooted for January 2008. In addition, CATALYST Control Center drivers will ship with enhanced control over each GPU in a multi-way arrangement.

Summing it up



The above image sums up the key performance parameters for the two initial cards in Radeon HD 3800 series and compares them with the HD 2900 XT.

The HD 3870 has faster core and memory speeds than the Radeon HD 2900 XT core, though they're somewhat hobbled by a comparative lack of memory bandwidth. The Radeon HD 3850 is much the same but sacrifices clock-speeds for a lower cost.

Initial samples will be equipped with 256MiB of onboard GDDR3 memory, rather than the 512MiB GDDR4 present on the 3870, and it's the 3850 which will benefit most from one of the new introductions to the range - PCIe 2.0.

Given the musings over the architecture and other bolt-ons, the Radeon HD 3870 should benchmark somewhere in the vicinity of the 2900 XT but come in at a much lower price-point - £140.

Let's put that another way - the Radeon HD 3870 should be as quick as the HD 2900 XT but cost half as much and have significantly better power credentials.

AMD has clearly decided that it can't compete with NVIDIA's high-end single-GPU cards - the GeForce 8800 GTX and GeForce 8800 Ultra - and doesn't intend to try.

Instead, it's focusing on a competitive price-to-performance ratio for mid-range cards which, of course, are likely to sell in far larger numbers. And the little brother, the Radeon HD 3850 should ship at around £110, we're informed.