Hacker News new | ask | show | jobs
by jcreinhold 2179 days ago
Regarding smaller batch sizes and batch normalization: Have you found another normalization layer to work better for small batch sizes?

I agree that the mean and variance of the small batches won't be representative of the true mean and variance, but in practice, I've used batch norm for small batch sizes successfully (e.g., <8). In medical imaging, due to memory constraints, I commonly see batch sizes of 2 (or even 1, although it's not really "batch norm" at that point).

The paper "Revisiting small batch training for deep neural networks" [1] discusses the benefits of small batch sizes even in the presence of batch norm (see Fig. 13, 14). They only look at some standard CV datasets, so it isn't conclusive by any means, but the experimental results jive with my experience and what appears to be other researchers experience.

[1] https://arxiv.org/pdf/1804.07612.pdf