| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tehsauce 755 days ago
	Grokking is a sudden huge jump in test accuracy with increasing training steps, well after training accuracy has fully converged. Double descent is test performance increasing, decreasing, and then finally rising again as model parameters are increased.

1 comments

scarmig 755 days ago

What they share is a subversion of the naive framework that ML works simply by performing gradient descent over a loss landscape. Double descent subverts it by showing that learning isn't monotonic in parameter count; grokking subverts it by learning after training convergence.

I'd put the lottery ticket hypothesis in the same bucket of "things that may happen that don't make sense at all for a simple optimization procedure."

link

baq 754 days ago

My takeaway from the paper is that you can guide training by adding/switching to a more difficult loss function after you got the basics right. Looks like they never got to overfitting grokking, so maybe there’s more to discover further down the training alley.

link