According to the Alexnet paper (The first real Imagenet CNN success story - http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf), the choice of 224x224 (but actually 227 - there's been some confusion with the paper I believe) was due to their use of data augmentation techniques (translations and reflections) on the 256x256 images. The sizes of Imagenet images varies but I believe it is common to crop to a minimum of 256x256 for the size/minimal overall information lost tradeoff.
Section 3.5 and 4.1 of the above paper have more information.
edit:
So I guess really it's down to:
1. The fact that square images are much easier to work with
2. The images are cropped to 256x256 because it's a convenient average size for imagenet
3. The 224/227 sizes are used to allow for the extraction of random patches for translation invariance
Section 3.5 and 4.1 of the above paper have more information.
edit: So I guess really it's down to:
1. The fact that square images are much easier to work with
2. The images are cropped to 256x256 because it's a convenient average size for imagenet
3. The 224/227 sizes are used to allow for the extraction of random patches for translation invariance