Hacker News new | ask | show | jobs
by BenFielding 3571 days ago
According to the Alexnet paper (The first real Imagenet CNN success story - http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf), the choice of 224x224 (but actually 227 - there's been some confusion with the paper I believe) was due to their use of data augmentation techniques (translations and reflections) on the 256x256 images. The sizes of Imagenet images varies but I believe it is common to crop to a minimum of 256x256 for the size/minimal overall information lost tradeoff.

Section 3.5 and 4.1 of the above paper have more information.

edit: So I guess really it's down to:

1. The fact that square images are much easier to work with

2. The images are cropped to 256x256 because it's a convenient average size for imagenet

3. The 224/227 sizes are used to allow for the extraction of random patches for translation invariance