Hacker News new | ask | show | jobs
by mingtianzhang 234 days ago
VLM can already process both the document images and the query to produce an answer directly. Do we still need the intermediate OCR step?