More on this point: computer vision tasks are (currently) far more reliant on the presence of a number of identifying features in the high frequency details of images.
The are looking for a significant subset of some identifying group of highly localised features:
think a large number of small things, rather than a small number of large things;
think colour gradients rather than colours.
These sorts of high frequency pieces of information can be placed into images in a way that is imperceivable to humans, but screams at computer vision neural networks.
A sign could say no right turn to people and no left turn to machines.
The are looking for a significant subset of some identifying group of highly localised features:
think a large number of small things, rather than a small number of large things;
think colour gradients rather than colours.
These sorts of high frequency pieces of information can be placed into images in a way that is imperceivable to humans, but screams at computer vision neural networks.
A sign could say no right turn to people and no left turn to machines.