Y
Hacker News
new
|
ask
|
show
|
jobs
by
mingtianzhang
234 days ago
VLM can already process both the document images and the query to produce an answer directly. Do we still need the intermediate OCR step?