Hacker News new | ask | show | jobs
by Mizza 1211 days ago
The chain-of-thought prompting in section 4.5 is extremely interesting to me, but it looks like they're missing a test group - what is the performance if the image is simply described and then the task is evaluated using only the text of the description, not only when combined with the image.