Y
Hacker News
new
|
ask
|
show
|
jobs
by
keanzu
2317 days ago
We first chose to display only data points obtained after 20 or more epochs of training. Then, by slicing through the “loss” axis, we observed that larger learning rates led to better performance (perplexity). You can reproduce this example here:
https://facebookresearch.github.io/hiplot/_static/demo/ml1.c...