| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lopuhin 853 days ago
	If convergence were a matter of luck, it would look completely different, like white noise, but it clearly has well-defined structure. The reason for high learning rate is that they used full batched training (see the first cell in https://colab.research.google.com/github/Sohl-Dickstein/frac...), and when batch sizes are large, learning rates typically can be large as well. Plus as others said it's more of a toy problem, it would be hard to get such detail on anything non-toy.