Hacker News new | ask | show | jobs
by keanzu 2317 days ago
We first chose to display only data points obtained after 20 or more epochs of training. Then, by slicing through the “loss” axis, we observed that larger learning rates led to better performance (perplexity). You can reproduce this example here:

https://facebookresearch.github.io/hiplot/_static/demo/ml1.c...