Hacker News new | ask | show | jobs
by mendeza 2782 days ago
Are there any good guides, tutorials, or research papers that investigate or advise how to inspect weights during training for debugging. The only things I read are to watch out for vanishing gradients, and when fine-tuning the most change in layers are seen toward the end of the network, not the beginning layers.
1 comments

Yes, a recent exciting phenomena of interest to researchers is how and why the spectrum of the Hessian appears to separate into 2 parts - a "bulk" part that changes very slowly and "outliers" that change quickly. This suggests that only a few weights in the model actually change during training. If one could determine which weights these are, it might lend to faster and more efficient learning algorithms that don't have to backprop to all the parameters in a large neural network.

https://arxiv.org/pdf/1706.04454.pdf

https://openreview.net/forum?id=ByeTHsAqtX