| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by andreyk 1292 days ago
	I work in this field (PhD candidate), and what you say is true for smaller models, but not GPT-3 scale models. Training large scale models involved a lot more, as the OP said. It's not just learning rate schedulers, it's a whole bunch of stuff. See this logbook from training the GPT-3 sized OPT model - https://github.com/facebookresearch/metaseq/blob/main/projec...

3 comments

lossolo 1292 days ago

Seems like majority of problems in this log are devops problems, which seems to be combination of ML people doing devops work while not having experience with devops work and really bad cloud vendor. I've been running multiple bare metal nodes with 8 GPUs each running 24/7 for months with almost 100% utilization and had 100x less problems than they had.

link

lucidrains 1292 days ago

it is neither as simple as the person you are responding to, nor as complicated as you make it seem. it will only get simpler with time.

link

marstall 1292 days ago

so creating each new rev of GPT3 would involve going through something like all those messy steps in that logbook?

link