|
|
|
|
|
by BenFielding
3571 days ago
|
|
According to the Alexnet paper (The first real Imagenet CNN success story - http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf), the choice of 224x224 (but actually 227 - there's been some confusion with the paper I believe) was due to their use of data augmentation techniques (translations and reflections) on the 256x256 images. The sizes of Imagenet images varies but I believe it is common to crop to a minimum of 256x256 for the size/minimal overall information lost tradeoff. Section 3.5 and 4.1 of the above paper have more information. edit:
So I guess really it's down to: 1. The fact that square images are much easier to work with 2. The images are cropped to 256x256 because it's a convenient average size for imagenet 3. The 224/227 sizes are used to allow for the extraction of random patches for translation invariance |
|