|
|
|
|
|
by dinedal
3522 days ago
|
|
Hey jcjohns, fan of your work. I've noticed that your project for fast-neural-style does instance normalization over batch normalization. Batch normalization has the benefit that you can merge the gamma & beta into a convolutional layer on the forward pass, which makes it a lot faster by allowing you to skip a step when building the styled images using a trained model. Can the same be done with instance normalization? I didn't see a formula in the paper but I would think so, since they are fairly closely related. |
|
With batch norm you learn four scalars per convolutional feature map: mu (mean), sigma (stddev), alpha (scale) and beta (shift). During training, mu and sigma are estimated from data statistics; during testing they are constants, either estimated from the entire training set or computed as a running mean during training. At test time the batch norm operation is then alpha * (x - mu) / sigma + beta, which is a linear operation since everything but x is constant; since it is linear it can be merged into a convolutional layer.
With instance norm, mu and sigma are estimated from data statistics during both training and testing; this means that the test-time forward pass is nonlinear, so it cannot be merged into a convolution (which is linear).