Hacker News new | ask | show | jobs
by charcircuit 54 days ago
>you'd need every single one of them, millions up millions of them, to be all zero

If they were all correlated with each other that does not seem far fetched.

1 comments

Ok but it's already known that you shouldn't initialize your network parameters to a single constant and instead initialize the parameters with random numbers.
The model can converge towards such a state even if randomly initialized.
Both you and the comment above are correct; initializing with iid elements ensures that correlations are not disastrous for training, but strong correlations are baked into the weights during training, so pretty much anything could potentially happen.