| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by moralestapia 384 days ago
	>Their hardware does inference with FP16, so they need ~20 of their CSE-3 chips to run this model. Care to explain? I don't see it.

1 comments

acchow 384 days ago

CSE-3 chip has 44GB, which can hold 22B parameters in FP16.

400B parameters would need 18 chips. Then you need a bit more ram for other stuff

link

moralestapia 384 days ago

That's on-chip SRAM, comparable to a GPU's L1 cache, of which it typically has megabytes.

CSE systems also come with off-chip memory, comparable to a GPU's memory, but usually in the TB range.

link

ryao 384 days ago

The memory bandwidth for that is 150GB/sec. Inference speed is memory bandwidth bound, so that memory is useless for inference. Discrete GPUs will run circles around the CSE-3 at inference if they tried using the external DRAM.

link

moralestapia 384 days ago

Where do you get those 150GB/sec from?

Here [1] they imply they can reach 1.2Tbps (allegedly, I know), and that's the previous generation ...

1: https://f.hubspotusercontent30.net/hubfs/8968533/Virtual%20B...

link

ryao 384 days ago

The other comment already clarified that 150GB/sec = 1.2Tbps. That said, the CSE-3 did not change this figure. It is buried in their specification sheets somewhere if you care to search for it. I did last year, which is how I know.

link

rkomorn 384 days ago

Doesn't 1.2Tbps / 8 = 150 GBps because 8b = 1B ?

link

moralestapia 384 days ago

That's ... right! Huh, missed that (assuming all units were written properly and mean what they mean).

Edit: yeah, double checked their site and everything. Dang, their IO is indeed "slow". They claim 1 microsecond latencies, but still, an H100 can move much more data than that.

link

acchow 384 days ago

If you want the titled 2500 tokens/second, you need to use the on-chip SRAM

link

moralestapia 384 days ago

What?

Of course they're using the on-chip SRAM, why wouldn't they?

This is a press release from Cerebras about a Cerebras chip, ... of course they are using a Cerebras chip!

Is that not obvious?

link

ryao 384 days ago

They also support external DRAM over their 150GB/sec system IO link. They call it MemoryX and talk about it on these blog posts:

https://www.cerebras.ai/blog/cerebras-cs-3-vs-nvidia-b200-20...

https://www.cerebras.ai/blog/announcing-the-cerebras-archite...

It is useless for inference, but it is great for training. It used to be more prominent on their website, but it is harder to find references to it now that they are mimicking Groq’s business model.

link