Review: Nvidia Turing Architecture Examined And Explained

by Tarinder Sandhu on 14 September 2018, 14:00

Tags: NVIDIA (NASDAQ:NVDA)

Quick Link: HEXUS.net/qadxf5

Add to My Vault: x

GeForce RTX 2080 Ti and RTX 2080 Speeds And Feeds

There's been bountiful talk about the Turing architecture and, at times, reference to the TU102 and TU104. These codenames refer to the specific silicon implementation that various GeForce RTX 20-series cards use.

GeForce RTX 2080 Ti

Here is that TU102 full-config block diagram again. As noted earlier but worth repeating, it's a biggie. The Nvidia GeForce RTX 2080 Ti is not a full implementation of this die, just as GeForce GTX 1080 Ti was not the full GP102. There are at least two reasons why this is the case. The first is that Nvidia typically reserves the full-fat die for the Titan-class cards - Titan Xt, perhaps, where it can charge a much higher premium. The second, more obvious answer is that it wants to reserve the full-on TU102 for the vastly more expensive Quadro RTX 6000.

Still, GeForce RTX 2080 Ti is certainly no shrinking violet. It uses 68 of the maximum 72 SMs, meaning 4,352 shaders. Also knowing that the Tensor cores, RT cores, geometry units, texture units and ROPs are all tied together from a ratio perspective, RTX 2080 Ti drops them to 544, 68, 34, 272, and 88, respectively. There's obviously also the associated diminution to total memory caches, register files, and crucially, memory-bus width, dropping from 384 bits to 352. At the same speed, RTX 2080 Ti is about 95 per cent of a full-on TU102. If those numbers give you a headache, it's easier to imagine RTX 2080 Ti as having a couple of those SM blocks deactivated.

GeForce RTX 2080

The RTX 2080, meanwhile, uses the TU104 die that we have also spoken about. In its complete form, shown above, it uses 13.6bn transistors, is built on the same 12nm process, and has 48 SMs that are identical to TU102. Of course, being a smaller die means there's less cache, only a 256-bit memory interface, and up to 8GB of GDDR6 operating at that same 14Gbps.

As you might have guessed, the RTX 2080 isn't the full implementation, either, as it carries 46SMs instead of 48. Though there is that same commensurate drop because the SMs carry the associated Tensor and RT cores, Nvidia keeps the full 256-bit memory bus, full 64 ROPs and the same 4MB of L2 cache as the full TU104, depicted above.

We're glad to see that Nvidia hasn't reduced the memory speed as it remains at 14Gbps and harnesses the same GDDR6 memory. Kind of gets confusing to know exactly what is going on because so many numbers are floating around, and we haven't even talked about frequencies yet, so let's jot them down into a table that also has the last-gen GeForce GTX 1080 Ti and GTX 1080.

 
GeForce RTX 2080 Ti (FE)
GeForce GTX 1080 Ti
GeForce RTX 2080 (FE)
GeForce GTX 1080
Launch date
September 2018
March 2017
September 2018
May 2016
Codename
TU102
GP102
TU104
GP104
Architecture
Turing
Pascal
Turing
Pascal
Process (nm)
12
16
12
16
Transistors (bn)
18.6
12
13.6
7.2
Die Size (mm²)
754
471
545
314
Core Clock (MHz)
1,350
1,480
1,515
1,607
Boost Clock (MHz)
1,545/1,635
1,582
1,710/1,800
1,733
Shaders
4,352
3,584
2,944
2,560
GFLOPS
13,448/14,231
11,340
10,598
8,873
Tensor Cores
544
-
368
-
RT Cores
68
-
46
-
Memory Size
11GB
11GB
8GB
8GB
Memory Bus
352-bit
352-bit
256-bit
256-bit
Memory Type
GDDR6
GDDR5X
GDDR6
GDDR5X
Memory Clock
14Gbps
11Gbps
14Gbps
10Gbps
Memory Bandwidth
616
484
448
320
ROPs
88
88
64
64
Texture Units
272
224
184
160
L2 cache (KB)
5,632
2,816
4,096
2,048
Power Connector
8-pin + 8-pin
8-pin + 6-pin
8-pin + 6-pin
8-pin
TDP (watts)
250/260
250
215/225
180
Current MSRP
$999/$1,199
$699
$699/$799
$499

Specs Comparo

Nvidia has two sets of specifications for both new cards. There's the base spec that a number of add-in card partners will adhere to, then there are the Founders Edition cards that, for the first time, offer a higher peak boost clock - an extra 80MHz for the RTX 2080 Ti and 90MHz for the RTX 2080. Nvidia reckons this makes sense because it has improved the cooling on the FE design significantly.

If you have managed to get this far, well done, though you will appreciate that peak specs don't do the new RTX cards full justice. Having more shaders is one thing, but the spec table cannot accommodate performance improvements from, say, the refined SM unit, RT cores, Tensor cores, etc.

Even so, there's a fair bit more FP32 TFLOPS on the table when compared to their model-equivalent 10-series FE cards. RTX 2080 Ti FE, for example, has 25 per cent more pure shader power; RTX 2080 FE has around 20 per cent. Nominal bandwidth is up 27 per cent and 40 per cent, respectively, while power has increased to take into account the extra performance.

One would expect the RTX 2080 Ti to be in a performance league of its own even if the Tensor and RT cores are sat idle; it's simply a lot more powerful. We'd expect to be at least 50 per cent faster, on average, than the GeForce GTX 1080 Ti at 4K. The RTX 2080, meanwhile, ought to be 10-15 per cent faster than the Ti champ of the last generation.

Specs tell you one thing, real-world performance can be something else, and we'll know exactly how these two new cards fare in current games in the next few days.