DDR Technology.
T. Despite this, DDR SDRAM (or Double Data-Rate Synchronous Dynamic Random Access Memory, if you prefer), is hardly a new product on a conceptual level, and has been featured in consumer graphics cards for well over a year. DDR’s mass-market début came with the 32Mb 166MHz memory included with nVidia’s GeForce DDR towards the end of 1999, but it apparently first appeared on the drawing board as far back as 1997. Finally, following numerous chipset and memory production delays, systems based around DDR became available to the general public at the beginning of the year.
The key innovation with DDR is, as its name suggests, the ability to transfer twice the volume of data per cycle. DDR presently comes in two flavours – PC1600 and PC2100 – with the numbers referring to their respective peak theoretical memory bandwidth figures in Mb/s. PC1600 operates on a 64-bit bus at 200MHz (or, more accurately, 100MHz DDR), providing a bandwidth of 1,600Mb/s; similarly, PC2100, running at 133MHz DDR, offers approximately 2,100Mb/s bandwidth. PC1600 was released to coincide with the launch of the AMD760 chipset, but consumers will have to wait just a little bit longer to get their hands on PC2100. It will only be once this comes out that the DDR bandwagon will really begin to gain momentum from a retail perspective, even though many vocal members of the hardware-reviewing community have (justifiably, in my opinion) spent what seems like an eternity evangelising the technology.
This naming defies the convention established by all recent memory standards including PC100 and PC133 (it is worth noting that PC150 is not an officially defined standard, but purely a marketing invention), and indeed PC600, PC700 and PC800 RDRAM (Rambus DRAM). For each of these, the number denotes the memory’s clock frequency in MHz, or effective frequency in the case of Rambus, which itself is DDR-based as it transfers 16 bits of data twice per clock cycle. Presumably it was thought that PC1600 and PC2100 were much more impressive-sounding than PC200 and PC266 respectively, and would result in DDR being perceived as a more advanced product than Rambus (which it isn’t, even though it can be faster and is far more sensibly-priced) by those less well-informed than the average hexus reader.
Now that we have determined what DDR is and why it is apparently more desirable than standard SDRAM, we need to consider the impact that DDR has on current applications and could potentially have on future ones. Many hardware enthusiasts, gamers in particular, have been disappointed by the reviews they have seen of the AMD760 and ALi MAGiK 1 chipsets for Socket A processors (AMD Thunderbird and Duron), and particularly the benchmarks contained therein.
Sadly, as is so often the case, the theoretical figures do not correspond with real-world performance. While the numbers, not to mention the DDR name, suggest that PC2100 should be twice as fast as regular PC133 (which has a bandwidth of 1,064Mb/s), the reality is far from this, even under optimal conditions. The performance advantage associated with DDR depends on the type of program being run – in so-called ‘streaming’ applications, data is almost continuously sent across the memory bus so that, over time, a memory bandwidth figure reasonably close to the peak theoretical value can be realised. This explains why benchmarks have shown DDR to be considerably faster than PC133 in, for example, MPEG4 encoding.
However, the advantages of DDR in games and geometry-intensive 3D applications are less evident. For the present at least, these tend to be more heavily dependent on cache speed and size, and on latency of the entire memory subsystem, than on bandwidth. This doesn’t imply that DDR’s extra bandwidth yields no improvements, merely that there are other potentially more important factors to consider. Since I believe this latter type of application that will be of more interest to hexus readers, the latency issue warrants further investigation.
Due to various bottlenecks, the real-world memory bandwidth of all SDRAM types in non-streaming situations is always much less than the theoretical bandwidth, but this is particularly the case with DDR. Foremost amongst these bottlenecks is latency itself, which, contrary to popular belief, has a twofold performance advantage - the indirect effect on actual bandwidth and the more obvious (and frequently more significant) direct effect of reducing CPU waiting time when retrieving data from main memory.
To make matters more confusing, we need to consider latency in terms of the entire memory system. For SDRAM modules, the most commonly quoted latency figure is the CAS (column access strobe) latency; this (along with, to a lesser extent, the RAS-to-CAS delay and RAS precharge time) is only one, albeit the most important, determinant of the memory system’s latency. A crucial consequence of this is that, while DDR may offer twice the (theoretical) bandwidth of standard SDRAM, its resultant memory system latency is well over half that of PC133, and thus its real-world bandwidth is considerably less than twice that of PC133. This higher than expected latency goes some way to explaining the relatively modest speed increases (usually between 5% and 15%) in games compared with PC133.
Also important in determining latency is the front side bus (FSB) frequency. Firstly, it is important that the FSB provides sufficient bandwidth to take advantage of DDR; the GTL+ bus that all P3 chipsets are based upon runs at a mere 133MHz with PC2100, so the FSB bandwidth (1,064Mb/s) is substantially less than the peak bandwidth of PC2100 RAM (2,100Mb/s). Remembering that all ‘traffic’ between the CPU and chipset needs to be sent across the FSB, it is clear that the bandwidth benefits of DDR will never be fully realised on a P3 system. Similarly, RDRAM’s poor performance on P3 systems can be attributed to a combination of its high latency and the bandwidth of the GTL+ bus.
The EV6 bus around which all chipsets for Athlon-core CPUs are based operates at a frequency of 100MHz or 133MHz DDR, and hence is capable of matching the theoretical bandwidth of PC1600 and PC2100 respectively. Despite the fact that FSB bandwidth is not a concern, the FSB frequency still exerts considerable influence on performance – latency is reduced as FSB speed increases, and, perhaps more importantly, when the memory bus and FSB are synchronous (running at the same speed) rather than asynchronous. This explains why the gaming performance of the KT133A chipset is noticeably greater than that of its elder brother, KT133 – the FSB of the latter is stuck at 100MHz DDR, so FSB and memory are asynchronous when the memory frequency is 133MHz, resulting in a considerable increase in latency.
Taking all the above into consideration, it has been calculated by those with a more detailed knowledge of such matters than myself that, while CAS2 PC2100 (predictably) offers the lowest overall latency, its margin of victory over CAS2 PC133 is somewhat less than might be expected. Furthermore, thanks to its lower operating speed and the slower FSB, both running at 100MHz DDR, PC1600 on AMD760 actually has a higher latency than PC133 on KT133A. As a result, any gains in bandwidth will at least partly be offset by increased latency in games and similar applications. The highly informative source article containing these figures can be read at http://www.aceshardware.com/Spades/read.php?article_id=5000183 , although it is worth reading the first article in the series (at http://www.aceshardware.com/Spades/read.php?article_id=5000172) beforehand.
Finally we can see how everything fits together – why benchmarks show that games tend to benefit less from DDR than streaming applications, why KT133A tends to perform closer to AMD760 than KT133 in games, why PC1600 offers minimal speed improvements over PC133, and so on. Before we finish, however, there are just a few more issues that need to be mentioned.
The impact of latency may be considerable on current games
and less pronounced on streaming applications, but that’s not to say that
streaming applications don’t benefit from latency or that games will never take
full advantage of bandwidth. Low
latency is always beneficial, as is high bandwidth, but when
data is being continuously streamed from memory, latency only affects
performance when the streaming commences.
Bandwidth will become more
important for future games, due to the exponential increase in polygon counts
that is likely to arise from the popularisation of the GeForce 3 or other cards
with similar programmable T&L functions.
However, don’t expect polygon counts to be significant enough for this
to happen for another 9-12 months or so, perhaps when the X-Box conversions
start arriving.
Beyond the imminent release of PC2100, there are a couple of products that could provide an additional boost to DDR. VIA’s KT266 chipset is set to be unveiled in the not-too-distant future, and is rumoured to be faster than both AMD and ALi’s offerings. Given that AMD760 is a very ‘immature’ platform compared with VIA’s Athlon chipsets, which have existed in various guises for some time now, it seems reasonable to assume that KT266 will be a solid performer, and may well bring DDR into the mainstream. Looking a little further ahead, we can expect to see PC2600 operating at 166MHz DDR, and perhaps even a 100MHz (400MHz effective) QDR module. Both these hypothetical products are guaranteed successes provided they are released within an appropriate timeframe at a reasonable price, and have strong motherboard support. Needless to say, they will be faster than PC2100 in both bandwidth and latency terms, though 100MHz QDR could suffer in the latter department.
There is but one question left unanswered by this – is it worth upgrading to DDR? I would say that it depends on your circumstances. DDR is unusual amongst new memory technologies in that it is actually offering tangible, sometimes even large, performance benefits on release, whereas previous innovations in memory (EDO, PC100, PC133 etc.) have either had negligible effects or have taken time to prove their worth. If you are intending to buy a high-performance AMD system in the near future, then, provided pricing isn’t excessive compared to current PC1600 levels, it would seem foolish to overlook DDR. I’d strongly recommending not settling for less than CAS2 PC2100, and it may also be worthwhile waiting to see what the KT266 is like. Unless you’re left seriously out of pocket, I can’t see it being a decision you’ll regret.
For those who have recently upgraded, or weren’t thinking of upgrading for some time, it may be better to wait and see what the future holds. PC2600, assuming it exists, seems like a good bet, and due to its high operating speed is as close to future-proof as any memory can be expected to be. By the time it becomes available, the benefits of DDR are likely to be more evident than at present, in games at least. Depending on the chosen FSB frequency, it may also compatible with AMD’s next generation K8 CPUs set to appear sometime during 2002.
Either way, the majority that don’t intend to buy a P4 anytime soon (although, given Intel’s recent efforts to sever links with Rambus, DDR may also eventually become the memory of choice for the P4), have no choice but to look towards DDR. It may not be everything you hoped for, but there’s no denying its current benefits and potential for the future.