Hacker News new | ask | show | jobs
by kkjkok 4596 days ago
Check out this paper:

Practical recommendations for gradient-based training of deep architectures, Y. Bengio

http://arxiv.org/abs/1206.5533

There is a section on weight initialization on page 15. In general, this paper has a lot of good information in one place.