Hacker News new | ask | show | jobs
by actionowl 2621 days ago
I was working on a project where we'd be printing several hundred thousand badges for several schools. We had all the data and just needed photos. The client sent us a DVD with several hundred thousand photos, upon inspection we realized that the photos where really bad:

- No single aspec ratio

- Some photos had no one in it (picture of a chair, etc)

- Some photos had multiple people in the photo (!?)

- Some photos were of such poor quality that you couldn't make out the person.

It seemed some locations let the students provide their own photo. This is the first time we'd ever encountered data in this shape.

My company had two options: Print the data as-is (which would result in thousands of reprints) or hire some temp staff to sort through the photos.

I asked them to let me try and sort them over the weekend with a library I just learned about (OpenCV). I was able to write a custom OpenCV python script a little over a hundred lines long and ran it over the weekend to crop and sort the photos into several categories (based on face detection) leaving only a few thousand that had to be manually reviewed! That had a real dollar impact and felt really good.