| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hashta 50 days ago
	Interesting read. I remember the grokking paper when it came out but I don't think I've ever seen that classic grokking loss curve in my own hands on real data. Curious if others have seen it more often in practice

1 comments

yorwba 50 days ago

To get pure grokking, you need a model large enough to easily memorize the entire training data and keep training for a long time after memorization. In practice, you'll probably use a more realistically-sized model that might grok on some subset of the data, but not so strongly that it's extremely obvious.

link

hashta 50 days ago

I think I trained models with #params >> #training examples for hundreds of epochs, but still don't recall seeing that loss curve on real data. Curious if others have seen it with larger models or much longer runs

link