Is it though? The human correctly interpreted the image. The problem is that the image was well, not "real". Human have a limit of figuring out what is real and not real based on experience.
I think the point is that these models are often hyped as being proof that we've reproduced human visual systems, and adversarial examples that humans can still resolve are evidence against that.
When the adversarial examples for humans MATCH the adversarial examples for image classifiers, that would be evidence of having reproduced a biological system.