Hacker News new | ask | show | jobs
by moralestapia 384 days ago
>Their hardware does inference with FP16, so they need ~20 of their CSE-3 chips to run this model.

Care to explain? I don't see it.

1 comments

CSE-3 chip has 44GB, which can hold 22B parameters in FP16.

400B parameters would need 18 chips. Then you need a bit more ram for other stuff

That's on-chip SRAM, comparable to a GPU's L1 cache, of which it typically has megabytes.

CSE systems also come with off-chip memory, comparable to a GPU's memory, but usually in the TB range.

The memory bandwidth for that is 150GB/sec. Inference speed is memory bandwidth bound, so that memory is useless for inference. Discrete GPUs will run circles around the CSE-3 at inference if they tried using the external DRAM.
Where do you get those 150GB/sec from?

Here [1] they imply they can reach 1.2Tbps (allegedly, I know), and that's the previous generation ...

1: https://f.hubspotusercontent30.net/hubfs/8968533/Virtual%20B...

The other comment already clarified that 150GB/sec = 1.2Tbps. That said, the CSE-3 did not change this figure. It is buried in their specification sheets somewhere if you care to search for it. I did last year, which is how I know.
Doesn't 1.2Tbps / 8 = 150 GBps because 8b = 1B ?
That's ... right! Huh, missed that (assuming all units were written properly and mean what they mean).

Edit: yeah, double checked their site and everything. Dang, their IO is indeed "slow". They claim 1 microsecond latencies, but still, an H100 can move much more data than that.

If you want the titled 2500 tokens/second, you need to use the on-chip SRAM
What?

Of course they're using the on-chip SRAM, why wouldn't they?

This is a press release from Cerebras about a Cerebras chip, ... of course they are using a Cerebras chip!

Is that not obvious?

They also support external DRAM over their 150GB/sec system IO link. They call it MemoryX and talk about it on these blog posts:

https://www.cerebras.ai/blog/cerebras-cs-3-vs-nvidia-b200-20...

https://www.cerebras.ai/blog/announcing-the-cerebras-archite...

It is useless for inference, but it is great for training. It used to be more prominent on their website, but it is harder to find references to it now that they are mimicking Groq’s business model.