Hacker News new | ask | show | jobs
by bdamos 3905 days ago
Yes, the processing pipeline first does face detection and a simple transformation to normalize all faces to 96x96 RGB pixels. Then each face is passed into the neural network to get a 128 dimensional representation on the unit hypersphere.

For a landscape, face detection would probably not find any faces and the neural network wouldn't be called.

And an image with multiple people will have many outputs: the bounding boxes of faces and associated representations.