| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gwern 2539 days ago
	Looking at the whitepaper, I'm a little surprised how little RAM there is for such an enormous chip. Is the overall paradigm here that you still have relatively small minibatches during training, but each minibatch is now vastly faster?

4 comments

ivalm 2539 days ago

IIRC they use batch size = 1 and each core only know about one layer. Which is to say this thing has to be trained very differently from normal SGD (but requires very little memory). There is also the issue that they rely on sparseness, which you get with relu activations, but if, for example, language models move to gelu activations they will be somewhat screwed.

link

IshKebab 2539 days ago

It's because it's SRAM, not DRAM. Think how much L3 cache your processor has. A few MB probably. That's what this chip's memory is equivalent to.

link

morphle 2539 days ago

We have up to 160 GB SRAM on our WSI. The rest of the transistors can be a few million cores or reconfigurable Morphle Logic (an open hardware kind of FPGA)

Our startup has been working on a full Wafer Scale Integration since 2008. We are searching for cofounders. Merik at metamorphresearch dot org

link

Veedrac 2539 days ago

“full utilization at any batch size, including batch size 1”

https://www.cerebras.net/

link

gwern 2539 days ago

That doesn't really mean anything. It (and any other chip) had better be able to run at least batch size 1, and lots of people claim to have great utilization... It doesn't tell me if the limited memory is part of a deliberate tradeoff akin to a throughput/latency tradeoff, or some intrinsic problem with the speedups coming from other design decisions like the sparsity multipliers, or what.

link

Veedrac 2539 days ago

Most of the chip is already SRAM, I'm not really sure what else you would expect?

18 GiB × 6 transistors/bit ≈ .93 trillion transistors

link

gwern 2539 days ago

Well, it could be... not SRAM? It's not the only kind of RAM, and the choice to use SRAM is certainly not an obvious one. It could make sense as part of a specific paradigm, but that is not explained, and hence why I am asking. It may be perfectly obvious to you, but it's not to me.

link

Veedrac 2539 days ago

You basically have the option between SRAM, HBM (DRAM), and something new. You can imagine the risks with using new memory tech on a chip like this.

The issue with HBM is that it's much slower, much more power hungry (per access, not per byte), and not local (so there are routing problems). You can't scale that to this much compute.

link

gwern 2538 days ago

But HBAM and other RAMs are, presumably, vastly cheaper otherwise. (You can keep explaining that, but unless you work for Cerebras and haven't thought to mention that, talking about how SRAM is faster is not actually an answer to my question about what paradigm is intended by Cerebras.)

link