| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wmf 932 days ago
	Limited memory, poor performance, etc. If you flip these performance improvement announcements they become damning; e.g. a 6x improvement means that they were previously running at less than 1/6th of optimal performance.

1 comments

sillywalk 932 days ago

Not that I know anything about these things, but FTA:

"..To briefly recap: the Cerebras Wafer Scale Cluster extends the 40GB of on-chip memory of each CS-2 with over 12 Terabytes of external memory. Unlike GPUs which only have 80GB of memory for all weights and activations, we store weights in external memory and activations in on-chip memory. The CS-2 works on one layer of the model at a time and weights are streamed in as needed, hence the name – Weight Streaming. This model provides us with over 100x larger aggregate memory capacity than GPUs, allowing us to natively support trillion parameter models without resorting to complex model partitioning schemes such as tensor and pipeline parallelism.

link

wmf 932 days ago

It sounds like they just recently got weight streaming working well which was my point: if you bought a CS-2 when it first came out you couldn't really use the off-chip memory and the on-chip memory wasn't enough to run LLMs.

link

rbanffy 931 days ago

> It sounds like they just recently got weight streaming working well

Even if it was working poorly, the CS2 is still a lot of computer. The question is whether it was price-competitive with Nvidia at the time for the workloads it was acquired for.

Cerebras offers them in a batch-processing cloud-ish model, so their prices should reflect utility to some degree.

link

efrank02 931 days ago

Interesting! But you think now that this is solved cerebras will be adopted more widely?

link

wmf 931 days ago

It's hard to say. We'll probably never have good information due to NDAs.

link