| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by baq 755 days ago
	My takeaway from the paper is that you can guide training by adding/switching to a more difficult loss function after you got the basics right. Looks like they never got to overfitting grokking, so maybe there’s more to discover further down the training alley.