Y
Hacker News
new
|
ask
|
show
|
jobs
by
brookst
652 days ago
This. Images passed to LLMs are typically downsampled to something like 512x512 because that’s perfectly good for feature extraction. Getting text would mean very large images so the text is still readable.