Hacker News new | ask | show | jobs
by asdfasdf1 832 days ago
- Interconnect between WSE-2's chips in the cluster was 150GB/s, much lower than NVIDIA's 900GB/s.

- non-sparse fp16 in WSE-2 was 7.5 tflops (about 8 H100s, 10x worse performance per dollar)

Does anyone know the WSE-3 numbers? Datasheet seems lacking loads of details

Also, 2.5 million USD for 1 x WSE-3, why just 44GB tho???

4 comments

>> why just 44GB tho???

You can order one with 1.2 Petabytes of external memory. Is that enough?

"External memory: 1.5TB, 12TB, or 1.2PB"

https://www.cerebras.net/press-release/cerebras-announces-th...

"214Pb/s Interconnect Bandwidth"

https://www.cerebras.net/product-system/

I can't find the memory bandwidth to that external memory. Did they publish this?
44GB is the SRAM on a single device, comparable to the 50MB of L2 on the H100. There is also a lot of directly attached DRAM.
No, it's comparable to 230Mb of SRAM on Groq chip, since both of them are SRAM-only chips that can't really use external memory.
Because SRAM stopped getting smaller with recent nodes.
Is that 150GB/s between elements that expect to run tightly coupled processes together? Maybe the bandwidth between chips is less important.

I mean, in a cluster you might have a bunch of nodes with 8x GPUs hanging off each, if this thing replaces a whole node rather than a single GPU, which I assume is the case, it is not really a useful comparison, right?