Cerebras 7nm WSE-2 processor packs in 2.6 trillion transistors

by Mark Tyson on 21 April 2021, 12:11

Quick Link: HEXUS.net/qaeqia

Add to My Vault: x

Please log in to view Printer Friendly Layout

Cerebras astonished tech watchers back in August 2019 when it unveiled an AI processor dubbed the Wafer Scale Engine (WSE). While most chips are made from small portions of a 12-inch silicon wafer, this is a single chip interconnected on a single wafer. This meant that the WSE featured 1.2 trillion transistors, for 400,000 AI computing cores over its 21.5cm sq (8.5inch sq) area. The TSMC 16nm fabricated Cerebras WSE also featured 18GB of local superfast SRAM with 9 petabytes per second bandwidth. Compare that to Nvidia's largest GPU (the A100) with 'just' 54 billion transistors and 7,433 cores.

 

WSE

WSE-2

Nvidia A100

Area

46,255mm2

46,255mm2

826mm2

Transistors

1.2 trillion

2.6 trillion

54.2 billion

Cores

400,000

850,000

6,912 + 432

On-chip memory

18GB

40GB

40MB

Memory bandwidth

9PB/s

20PB/s

1,555GB/s

Fabric bandwidth

100Pb/s

220Pb/s

600GB/s

Fabrication process

16nm

7nm

7nm

 

At Hot Chips in August 2020, Cerebras started to tease a second gen WSE, which would be fabricated by TSMC on its 7nm process. The revamped full wafer chip would pack 2.6 trillion transistors for 850,000 AI optimised cores the chip designers teased. Now the 7nm-based WSE-2 has been fully unveiled.

The WSE-2 will be the power behind the Cerebras CS-2, "the industry’s fastest AI computer, designed and optimized for 7nm and beyond". According to Cerebras the new AI computer will be more than twice as fast as the first-gen CS-1. "In AI compute, big chips are king, as they process information more quickly, producing answers in less time – and time is the enemy of progress in AI," explained Dhiraj Mallik, Vice President Hardware Engineering, Cerebras Systems. "The WSE-2 solves this major challenge as the industry’s fastest and largest AI processor ever made."

Performance comparisons claimed by Cerebras are that, depending upon workload, the CS-2 "delivers hundreds or thousands of times more performance than legacy alternatives". Moreover, it is said to do so while consuming a fraction of the power in a relatively small system form factor (26-inches tall). A single CS-2 can thus "replaces clusters of hundreds or thousands of graphics processing units (GPUs) that consume dozens of racks, use hundreds of kilowatts of power, and take months to configure and program," says Ceberas.

The CS-1 was deployed in the Argonne National Laboratory and by pharmaceuticals firm GSK previously. Representatives from both these organisations are looking forward to deliveries of CS-2 systems, as the CS-1 surpassed expectations in accelerating complex training, simulation, and prediction models.



HEXUS Forums :: 9 Comments

Login with Forum Account

Don't have an account? Register today!
How many of these do they have to make before they get one with no damage/errors that is completely functional???
1 because they have 100% yeild by designing a system in which any manufacturing defect can be bypassed – there are extra cores to allow for defects.
Exciting times for datacentre people but I am not a big fan of the internet turning into the internet of datacentres of large websites and services

It's good to see that people can still keep AMD and Intel on their toes. Though of course ARM has been doing this for decades
What is the hashrate of one of these,,,,,, everything to get miners to look elsewhere. :mrgreen:
ronray
1 because they have 100% yeild by designing a system in which any manufacturing defect can be bypassed – there are extra cores to allow for defects.

Indeed. Rate it at 95% of nominal capability and you have plenty of room to play with. Or just put a great big * by the figures.

*nominal values based on perfect fabricobblin' - your experience may vary considerably.


Anyone know if this can play Crysis?