Hacker News new | ask | show | jobs
by HarHarVeryFunny 584 days ago
Also:

nets too small (not enough layers)

gradients not flowing (residual connections)

layer outputs not normalized

training algorithms and procedures not optimal (Adam, warm-up, etc)