> W = np.random.normal(0, np.sqrt(2/(h.shape[0] + layer_dim[i])), size = (layer_dim[i], h.shape[0]))
A N(0, sqrt(2/width)) would produce negative values.
I missed that he's not storing activations for those graphs, he's storing activations+batch norm. See my edit.
I missed that he's not storing activations for those graphs, he's storing activations+batch norm. See my edit.