Y
Hacker News
new
|
ask
|
show
|
jobs
by
sampton
246 days ago
You can train a cnn to find bounding boxes of text first. Then run VLM on each box.