Y
Hacker News
new
|
ask
|
show
|
jobs
by
htrp
480 days ago
VLM's can't replace ocr one to one.. most hosted multimodal models seem to have a classical OCR (tesseract-based) step in their inference loop