Hacker News new | ask | show | jobs
by Houshalter 4044 days ago
That's from the "Nearest Caption in the Training Dataset". Which means it found the most similar image, and that image had that caption.
1 comments

No. That is not how it works. Read the papers again.
It very clearly says "Nearest Caption in the Training Dataset". The generated labels are below it.
What papers?