Hacker News new | ask | show | jobs
by gwern 3043 days ago
Initialization?

> W = np.random.normal(0, np.sqrt(2/(h.shape[0] + layer_dim[i])), size = (layer_dim[i], h.shape[0]))

A N(0, sqrt(2/width)) would produce negative values.

1 comments

I was talking about the graphs here: https://i.imgur.com/M6P71aC.jpg

I missed that he's not storing activations for those graphs, he's storing activations+batch norm. See my edit.