|
|
|
|
|
by sdenton4
46 days ago
|
|
I dunno... gradient descent is only really reliable with a big bag of tricks. Knowing good initializations is a starting point, but recurrent connections and batch/layer normalization go a very long way towards making it reliable. |
|