I set up this super simple ‘Which Face Is Real?’ (http://www.whichfaceisreal.com/) style challenge. Click the row to show the answers. You might need to zoom out.
> I get 100% reliably with the first link, and got 4/5 on the cropped version so far.
Looking at whichfaceisreal, How much time do you have to spend on each decision, and would your success rate change if you didn't know in advance that exactly 1 of 2 was generated? It's easy to say 100% reliable, but I find myself really having to dig deep with my eyes to search for small tells* , which you have to know to do up front before you actually do it.
* - Often the tells are as minuscule as some ringing around the hair, which could just as easily be compression artifacts on a real photo.
For the first link (game.html), normally 5-10s, but much longer or shorter for some of them. For the second (game_cropped.html), it takes much longer, like a minute or so, except when the real image contains something distinctive StyleGAN2 can't do. For whichfaceisreal.com, which uses the original StyleGAN and only offers two options, none hand-picked, it takes me a second or two.
I thought StyleGAN2's teeth were one of the biggest upgrades to the face proper. The original's were pretty bad, but removing the phase artefact issue seems to have made a huge difference.
Here are some examples of teeth I found pretty decent:
Also after watching the video from the StyleGAN2 team https://drive.google.com/file/d/1f_gbKW6FUUHKkUxciJ_lQx29mCq... now I know that original StyleGAN, images from which are apparently used for this "game", produces faces with a "water droplet" and phase artifacts, so I was able to spot few fakes just by looking for those things.
How is it speculative? The background does not always allow for the discrimination to be made but in the random sample of ~20 faces I looked it was the main factor in maybe 1/4 of the cases (I was %100 accurate on this random sample). Of course, my random sample is not your random sample. We could probably do a controlled experiment to get at these kinds of attributions systematically.
I might also say that a second major discriminating factor is skin texture, especially at boundaries.
Looking at whichfaceisreal, How much time do you have to spend on each decision, and would your success rate change if you didn't know in advance that exactly 1 of 2 was generated? It's easy to say 100% reliable, but I find myself really having to dig deep with my eyes to search for small tells* , which you have to know to do up front before you actually do it.
* - Often the tells are as minuscule as some ringing around the hair, which could just as easily be compression artifacts on a real photo.