Hacker News new | ask | show | jobs
by jaclaz 2641 days ago
>You have probably seen https://thispersondoesnotexist.com/ by Philip Wang.

>But have you ever wondered if they made photos that look like you? Let's check!

>We collected the huge dataset of 428526 fake generated photos and extracted their facial parameters with https://github.com/ageitgey/face_recognition. Now you can match your image against fake faces and compare with the closest matches. Enjoy!

Maybe it is just me, but how would I go if I wanted to harvest large numbers of "real" photos of "real" persons?

Having curious people uploading them to my website could be an easy way.

4 comments

https://thispersondoesnotexist.com/ only exists because someone already collected enough real photos of real persons to have a neural network learn their distribution well enough to generate new samples.

Harvesting even more photos is not really necessary at this point, and in any case, scraping them off the web would be faster than creating a novelty website.

>Harvesting even more photos is not really necessary at this point, and in any case, scraping them off the web would be faster than creating a novelty website.

But scraping them from the web says nothing about the source, even if you manage to remove all stock photos.

This way it is IMHO more likely that it is a "real" photo, most probably uploaded by a "real" user and the site has also the IP of the sender.

Morover, most photos you can find on the web have had their EXIF information removed by the host, maybe it is not the case for a casual user.

As I see it scraping them off the web is good for quantity but not so much for quality, this (completely hypothetical) approach would give less quantity but IMHO better quality data.

I definitely tried it with more junk images that I had in stock than real photos, so, there's that. There's moment where you need to know what face a computer would say resemble the most a sushi... So I'm not 100% sure about quality
> how would I go if I wanted to harvest large numbers of "real" photos of "real" persons?

I guess you could just steal them from Flickr?

http://fortune.com/2019/03/12/millions-of-flickr-photos-were...

>Maybe it is just me, but how would I go if I wanted to harvest large numbers of "real" photos of "real" persons?

Use APIs or scraping to collect profile photos from facebook, twitter, google, gravatar, etc. You'll get a lot of non-person photos, but havetheyfaked.me probably does too.

Yep, but as said above that would be "big data" with the need of de-duplicating them and with no additional (reliable) parameters.

Then you will need some AI (or whatever) to remove non-human photos or non-suitable photos (position, lighting, etc.) whilst this method would almost guarantee only "portraits" or "upper torso" pictures of humans.

What I tried to say wasn't that this is the "only" method, but that it is one of the "easy" ones with a high probability of getting "reliable" data.

Maybe all the images from thispersondoesnotexist.com are coming from havetheyfaked.me?