Hacker News new | ask | show | jobs
by acchow 384 days ago
If you want the titled 2500 tokens/second, you need to use the on-chip SRAM
1 comments

What?

Of course they're using the on-chip SRAM, why wouldn't they?

This is a press release from Cerebras about a Cerebras chip, ... of course they are using a Cerebras chip!

Is that not obvious?

They also support external DRAM over their 150GB/sec system IO link. They call it MemoryX and talk about it on these blog posts:

https://www.cerebras.ai/blog/cerebras-cs-3-vs-nvidia-b200-20...

https://www.cerebras.ai/blog/announcing-the-cerebras-archite...

It is useless for inference, but it is great for training. It used to be more prominent on their website, but it is harder to find references to it now that they are mimicking Groq’s business model.