Hacker News new | ask | show | jobs
by exclipy 345 days ago
What happened to wafer inference hardware like Cerebras? Why isn't Claude being served from that if it's so much faster and energy efficient?
2 comments

I doubt Cerebras has even close to the scale to be a major player in this area.

Nvidia sold $35B of just datacenter GPUs last year. Of which the vast majority will be used for AI.

Cerebra entire revenue last year was only $78M. That’s three orders of magnitude smaller than Nvidia datacenter GPU business. Scaling a company 10X in a year is a pretty hard thing to do, and it’s not a question of money, it’s a question of people and organisation. So much stuff in a business breaks when it scales 10X, that it take months to years to fix enough stuff to support another 10x growth spurt without everything just imploding.

And also if they can keep up. Imagine not just selling that many GPUs, but selling that many new GPUs every few years for the same amount of money or more. Where the previous generation hardware becomes almost worthless.

The insane thing here is that $35B worth of GPUs will be worth more like $350m in a few years. Or less. Who can keep up with that???

Currently Cerebras, although faster, is more expensive than the traditional alternatives. Cursor's use case doesn't benefit from instant, users are happy to wait the few seconds (and watching the magic may even be beneficial)
How is it more expensive?
Fancy hardware with bespoke production process, smaller economies of scale, utilization probably not that great since they are user-speed positioning and purportedly under-invested in their compiler, which has a hard job compiling for such an arch anyways. Ignoring for the moment the cost for their bespoke software stack, which they can probably amortize away eventually.
according to OpenRouter, Cerebras charges $0.65/$0.85 for 1m input/output tokens for Llama 4 Scout. Google charges $0.25/$0.70; lambda.ai charges $0.08/$0.30 for the same model.