Hacker News new | ask | show | jobs
by brookst 652 days ago
This. Images passed to LLMs are typically downsampled to something like 512x512 because that’s perfectly good for feature extraction. Getting text would mean very large images so the text is still readable.