| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ripvanwinkle 1047 days ago
	Thank you! What is batch normalization doing and how does it help

4 comments

pseudonom- 1046 days ago

There are other mechanisms for dealing with vanishing and exploding gradients. I (maybe wrongly?) think of batch normalization as being most distinctively about fighting internal covariate shift: https://machinelearning.wtf/terms/internal-covariate-shift/

link

bkitano19 1046 days ago

Karpathy covers this in Makemore, but the tl;dr is that if you don’t normalize the batch (essentially center and scale your activations down to be normally distributed), then at gradient/backprop time, you may get values that are significantly smaller or greater than 1. This is a problem, because as you stack layers in sequence (passing outputs to inputs), the gradient compounds (because of the Chain Rule), and so what may have been a well behaved gradient at the end layers has either vanished (the upstream gradients were 0<x<1 at each layer) or exploded (the gradients were x>>1 upstream). Batch normalization helps control the vanishing/exploding gradient problem in deep neural nets by normalizing the values passed between layers.

link

ripvanwinkle 1046 days ago

got it,thanks

link

mike_hearn 1046 days ago

It's another one of those mathematical hacks that NNs love so much, which stops the numbers spiralling out of control in big networks.

link

ripvanwinkle 1046 days ago

folks thanks for the explanation

link