| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cma 539 days ago

>the time it takes it to learn what works/doesn't work widens.

From the raw scaling laws we already knew that a new base model may peter out in this run or the next with some amount of uncertainty--"the intersection point is sensitive to the precise power-law parameters":

https://gwern.net/doc/ai/nn/transformer/gpt/2020-kaplan-figu...

Later graph gpt-3 got to here:

https://gwern.net/doc/ai/nn/transformer/gpt/2020-brown-figur...

https://gwern.net/scaling-hypothesis