For many generative models, this is on the way to become a standard- using Humans as a judge of generated material, and this is not limited to Computer Vision either. I am about to use this technique to judge the sanity of text generated by a Transformer model for a paper that I am writing (with a small group).
There are also attempts to properly standardize it, and this is called- HYPE [0]. And there are big names like Fei-Fei Li and Michael Bernstein behind it.
Not sure they are fooling anyone. It's more like "is our generated image good enough to make a human recognize what it is to get rid of an annoying pop up?". If there were actually consequences to getting it right/wrong people would pay more attention I'm sure.
Exactly. That is another way to improve accuracy once you have done it ”in a regular” way already. You can look for synthetic image generation and it’s benefits on model accuracy and optimization.