Hacker News new | ask | show | jobs
by Veedrac 2500 days ago
They say they support efficient execution of smaller batches. They cover this somewhat in their HotChips talk, eg. “One instance of NN, don't have to increase batch size to get cluster scale perf” from the AnandTech coverage.

If this doesn't answer your question, I'm stuck as to what you're asking about. They use SRAM because it's the only tried and true option that works. Lots of SRAM means efficient execution of small batch sizes. If your problem fits, good, this chip works for you, and probably easily outperforms a cluster of 50 GPUs. If your problem doesn't, presumably you should just use something else.