| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by keanzu 2364 days ago
	We first chose to display only data points obtained after 20 or more epochs of training. Then, by slicing through the “loss” axis, we observed that larger learning rates led to better performance (perplexity). You can reproduce this example here: https://facebookresearch.github.io/hiplot/_static/demo/ml1.c...