| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dartos 890 days ago
	Well doesn’t the compute time for transformers scale roughly quadratically with model size? Would it make sense for power consumption to also scale roughly quadratically?

1 comments

tomjohnneill 890 days ago

I'm not sure. The figures I've seen suggest that GPT3 required 10x more energy to train than GPT2 (e.g. https://www.nnlabs.org/power-requirements-of-large-language-....), so I think a roughly 1-2 order of magnitude increase in energy usage from GPT2 to GPT3.5 makes sense.

link