Hacker News new | ask | show | jobs
by huskyr 4006 days ago
I've done basically the same thing as you for a project i did for a newspaper where i collected 9000 selfies (see http://vk.nl/selfies).

It's a lot of manual work, but using OpenCV saves you a lot of time. I can't share the code unfortunately, but what i did was this:

* Get all Instagram photos with the '#selfie' tag

* Run it all through the haarcascade_frontalface_alt2 OpenCV cascade, i used the 1.3 and 5 values for the detectMultiScale() method.

* Check that there's only one face in the image, and make sure it's larger than 20% of the width of the image.

Even after that i still needed to go manually through the images. I guess around 10% was still false positives.