| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by HarHarVeryFunny 584 days ago

Also:

nets too small (not enough layers)

gradients not flowing (residual connections)

layer outputs not normalized

training algorithms and procedures not optimal (Adam, warm-up, etc)