|
|
|
|
|
by omegalulw
1344 days ago
|
|
This is not that relevant for ML. Each gradient pass will re-compute your cost function and the gradients so errors are not likely to accumulate. The main thing is to not make errors big enough that you end up in a completely different part of the parameter space derailing progress which is what the above commenter points out. |
|
I am familiarizing myself with recurrent neural networks and getting them trained online is a pain - I get NaNs all the time except for very small learning rates that actually prevent my networks to learn anything.
The deeper network is, the more pronounced accumulation of errors in online training is. Add 20-30 fully connected (not highway or residual) layers before softmax and you'll see wonders there, you won't be able to have anything stable.