| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Der_Einzige 2184 days ago
	Yet another paper with results that basically look like this: https://d3b8hk1o42ev08.cloudfront.net/wp-content/uploads/201... Still impressive, don't get me wrong, but I am starting to believe that NLP will be dominated increasingly by the big players since they are the only ones who can train a 1 TRILLION parameter model (they show that in the paper). I can't even do inference with a 36 layer, 2048 neuron per layer network with my GTX 2080ti. Sad....

2 comments

rahimnathwani 2184 days ago

"I can't even do inference with a 36 layer, 2048 neuron per layer network with my GTX 2080ti."

Not even for a single instance? Your GPU has 11GB of RAM. Why isn't 14k per neuron enough? Is the input really large, or does each neuron have very high precision?

link

MrUssek 2184 days ago

There's an extremely large number of parameters per "neuron". The 600B parameters will take up more than 1TB of space in memory, far too much for the 2080 TI or even main memory for most systems.

link

rahimnathwani 2184 days ago

I'm not talking about inference on a 600B parameter model. GP said they can't do inference on a 32-layer, 2048 neurons-per-layer network. Let's assume every layer is fully connected. So each neuron will have 2048 parameters. So that's 32 * 2048 * 2048 parameters. That's 132MM parameters in 11GB of RAM, or 82 bytes per parameter. If each parameter is 4 bytes (that seems like a lot of precision), plus 4 bytes per calculated value, you're still only using 10% of the GPU's RAM. You should be able to do inference on a batch of 16-20 examples at a time.

What have I missed?

link

tehsauce 2184 days ago

2048 neurons per layer isn't really an accurate description, what he means is 2048 dimensional embeddings at each layer. The actual multihead attention layers in a transformer are not just feed forward 2048*2048, but actually have many more parameters. That's why there's 600B total.

link

fhssn1 2184 days ago

Stay tuned for algorithmic advancements.

link