Hacker News new | ask | show | jobs
by BobbyJo 1330 days ago
I'd like to know why there is a difference in the final loss at all. If the two networks had the same architecture, used the same loss function, and had random uniform initialization, then 1000 epochs should have them converging on very similar final loss values. Especially if one was able to converge to 3e-4.