Hacker News new | ask | show | jobs
by dinedal 3522 days ago
Hey jcjohns, fan of your work.

I've noticed that your project for fast-neural-style does instance normalization over batch normalization.

Batch normalization has the benefit that you can merge the gamma & beta into a convolutional layer on the forward pass, which makes it a lot faster by allowing you to skip a step when building the styled images using a trained model.

Can the same be done with instance normalization? I didn't see a formula in the paper but I would think so, since they are fairly closely related.

1 comments

I've found that instance normalization usually gives better results so I prefer it over batch normalization.

With batch norm you learn four scalars per convolutional feature map: mu (mean), sigma (stddev), alpha (scale) and beta (shift). During training, mu and sigma are estimated from data statistics; during testing they are constants, either estimated from the entire training set or computed as a running mean during training. At test time the batch norm operation is then alpha * (x - mu) / sigma + beta, which is a linear operation since everything but x is constant; since it is linear it can be merged into a convolutional layer.

With instance norm, mu and sigma are estimated from data statistics during both training and testing; this means that the test-time forward pass is nonlinear, so it cannot be merged into a convolution (which is linear).

Awesome, thanks for your response!