Hacker News new | ask | show | jobs
by sampton 246 days ago
You can train a cnn to find bounding boxes of text first. Then run VLM on each box.