Hacker News new | ask | show | jobs
by htrp 480 days ago
VLM's can't replace ocr one to one.. most hosted multimodal models seem to have a classical OCR (tesseract-based) step in their inference loop