Hacker News new | ask | show | jobs
by tensor 480 days ago
100% this, combining traditional OCR with VLMs that can work with bounding boxes so that you can correlate the two is the way to go.