Hacker News new | ask | show | jobs
by shenberg 859 days ago
I suspect that weight initializations are geared towards inputs being normal random variables with mean 0 and variance 1. Deviating from that makes the learning process unhappy.