Hacker News new | ask | show | jobs
by simonw 276 days ago
Yeah, I've been disappointed in GPT-5 for OCR - Gemini 2.5 is much better on that front: https://simonwillison.net/2025/Aug/29/the-perils-of-vibe-cod...
1 comments

Images in general, nothing comes close to Gemini 2.5 for understanding scene composition. They perform segmentation and so you can even ask for things like masks of arbitrary things or bounding boxes.