Hacker News new | ask | show | jobs
by wmf 932 days ago
Limited memory, poor performance, etc. If you flip these performance improvement announcements they become damning; e.g. a 6x improvement means that they were previously running at less than 1/6th of optimal performance.
1 comments

Not that I know anything about these things, but FTA:

"..To briefly recap: the Cerebras Wafer Scale Cluster extends the 40GB of on-chip memory of each CS-2 with over 12 Terabytes of external memory. Unlike GPUs which only have 80GB of memory for all weights and activations, we store weights in external memory and activations in on-chip memory. The CS-2 works on one layer of the model at a time and weights are streamed in as needed, hence the name – Weight Streaming. This model provides us with over 100x larger aggregate memory capacity than GPUs, allowing us to natively support trillion parameter models without resorting to complex model partitioning schemes such as tensor and pipeline parallelism.

It sounds like they just recently got weight streaming working well which was my point: if you bought a CS-2 when it first came out you couldn't really use the off-chip memory and the on-chip memory wasn't enough to run LLMs.
> It sounds like they just recently got weight streaming working well

Even if it was working poorly, the CS2 is still a lot of computer. The question is whether it was price-competitive with Nvidia at the time for the workloads it was acquired for.

Cerebras offers them in a batch-processing cloud-ish model, so their prices should reflect utility to some degree.

Interesting! But you think now that this is solved cerebras will be adopted more widely?
It's hard to say. We'll probably never have good information due to NDAs.