Hacker News new | ask | show | jobs
by timomaxgalvin 522 days ago
It's easy to make something work when the example goes from being out of the training data to into the training data.
2 comments

Definitely. But I also tried with a picture of an absurdist cartoon drawn by a family member, complete with (carefully) handwritten text, and the analysis was absolutely perfect.
A simple test - take one of your own photos, something interesting, and put in into a LLM, let it describe it in words. Then use a image generator to create the image back. It works like back-translation image->text->image. It proves how much the models really understand images and text.
I wouldn't blame a machine to fail something that a first glance looks like an optical illusion...