Review: AMD Socket AM2: Athlon 64 FX-62 and nForce5 590 SLI

by Ryszard Sommefeldt on 23 May 2006, 05:00

Tags: AMD (NYSE:AMD)

Quick Link: HEXUS.net/qafr6

Add to My Vault: x

New K8 core revision

I told you that new K8 core revisions are needed to bring Sempron and Athlon 64 to AM2, but first a refresher on K8 itself, should you fancy brushing up on your architecture knowledge. Skip this bit and read the revision guide further down the page if you're already clued up.

AMD K8

I'll go over the K8 backwards, talking about the processing core first, then the front end that feeds it last. Hopefully it'll make some sense come the end.

At its heart, K8 is a three integer ALU, three floating-point ALU design. Taking the integer hardware, each is an instruction duplicate and is paired with an address unit and fed by an 8-deep queue of instructions setup by the CPU's instruction scheduler. The floating-point units are different, each tasked with a different instruction set to execute, one ADD, one MUL and one 'other' unit to do the rest of the FP ops. The floating point hardware is fed by a queue of instructions 12 deep, setup by the scheduler.

The front end to that is the fetch/prefetch logic, which works on a stream of instructions. Those instruction are fetched, aligned and decoded into micro-ops, the front-end's branch unit branching in code as needed (an expensive op to perform, the CPU working to branch at a minimum).

Those major instructions are split into integer and FP groups and fed off to the main core as outlined above.

The main integer pipe is 12 stages long (2 fetch stages, 2 decode stages, 2 pack stages, 2 dispatch and schedule stages and 2 cache op stages all wrapping up instruction choice and execute at one stage each), giving the K8 core much of its code execution efficiency.

K8's fetch stages work in the CPU's L1 cache memory (a 128KiB total L1 on 'full' K8 implementations, split 50/50 instruction and data). The two decode stages can decode up to three x86 instructions per cycle, before translating them into micro-ops, depending on the size of the instruction (x86 instructions vary in size, the core of why it's CISC and not RISC).

The branch hardware has a 2K entry buffer for branch info, for predicting branches that code will take so that a branch penalty or misprediction is minimised. L2 cache doesn't mirror L1 as in other CPU architectures, rather it's a retirement cache for no longer needed L1 data (including decoder info, saving a redecode of instructions), and the cache latency for modern K8 revisions has ~50 cycle maximum in L2, single digit latency in L1 ,and bandwidths around 10GB and 30GB/sec respectively, helping to keep the front end busy generating an op stream for the execution hardware.

And that's pretty much it (as a somewhat high-level overview of how K8 goes about its business). The on-die memory controller is a big part of the master plan that seeks to keep the front end hardware as busy as possible. When K8 needs a trip to main memory, it likes return data as quickly as possible, hence low access latency being time and time again one of the defining features of why Athlon 64 performs so well.

K8 doesn't like to be idle, essentially, and it's built around being busy in order to extract maximum performance. That seems obvious, but it's not easy, and K8's design is all about making it happen.

The three-way integer and FP units then finish it off, making it easy for technologists to enthuse about the real-world performance of K8 CPUs, and it's performance in that area which is where processors will innovate in their cores in coming months and years (with ever larger caches to keep it all working), both from AMD and Intel.

That's K8 in a somewhat technical nutshell, since it's been a while since we presented it to you in such a fashion (if at all, if memory serves) at HEXUS.

Revision F

The latest K8 revision on Socket 939 and 940 for Athlon 64 and Opteron is called rev. E. Rev. E, built on 90nm SOI, brought lower latency L2, power reductions, higher clocks, SSE3, better data prefetch and memory controller improvements, and it's what powers the latest K8 microprocessors, including Athlon 64 X2 and dual-core FX.

Rev. F builds on that with a new DDR2 memory controller, virtualisation technology (more on which shortly) and further power tweaks, although what else is new compared to Rev. E is unclear. Here are the hard specs.

Specification AMD Athlon 64 FX-62 Orleans Manila
Product Athlon 64 X2, FX Athlon 64 Sempron
L1 cache per core 64KiB data, 64KiB instruction 64KiB data, 64KiB instruction 64KiB data, 64KiB instruction
L2 cache per core 512KiB or 1MiB 512KiB or 1MiB 256KiB or 512KiB
System Link HyperTransport, 1 link, 2000MTs/sec, 1000MHz HyperTransport, 1 link, 2000MTs/sec, 1000MHz HyperTransport, 1 link, 2000MTs/sec, 1000MHz
Memory Controller Dual-channel, DDR2-800, 128-bit Dual-channel, DDR2-667, 128-bit Dual-channel, DDR2-667, 128-bit
Process technology and fab 90nm SOI, Fab30 Dresden 90nm SOI, Fab30 Dresden 90nm SOI, Fab30 Dresden
Transistor count and die size 227.4M, 230mm², 1MiB per core; 153.8M, 183mm², 512KiB per core 81.1M, 103mm² Unknown
TDPmax, ICCmax, Voltage range 125W, 90A, 1.30V-1.40V 89W, 68A, 1.30V-1.40V Unknown

We were sampled by AMD with Athlon 64 FX-62, the flagship AM2 processor sporting dual rev. F K8 cores each with 1MiB L2, 2.8GHz clock frequency and 1.40V operating voltage. Let's see that up close.