| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jakderrida 727 days ago

Exactly what I was thinking.

He also says

>Paradoxically, smaller models require more training to reach the same level of performance.

It's not a paradox at all. If less training to reach the same level of performance was true, that would be a paradox whereby they'd be trained for under a nanosecond to achieve optimal performance/size payoff.