Hacker News new | ask | show | jobs
by jrbaldwin 3902 days ago
We usually "region propose" and crop to a certain area (in this case the face area, usually at 256x256) then transform to align eye areas before passing to training. This is to standardize the data beforehand. I'm not sure if this lib does region proposal but you can easily write a pre-processor with openCV face plugins to identify face regions (if any, maybe your training image is a landscape not a face!) for cropping.
1 comments

Yes, the processing pipeline first does face detection and a simple transformation to normalize all faces to 96x96 RGB pixels. Then each face is passed into the neural network to get a 128 dimensional representation on the unit hypersphere.

For a landscape, face detection would probably not find any faces and the neural network wouldn't be called.

And an image with multiple people will have many outputs: the bounding boxes of faces and associated representations.