Hacker News new | ask | show | jobs
by fzysingularity 480 days ago
That's a tough one to answer right now, but to be perfectly honest, we're off by 2-3 orders of magnitude in terms of chars/W.

That said, VLMs are extremely powerful visual learners with LLM-like reasoning capabilities making them more versatile than OCR for practically all imaging domains.

In a matter of a few years, I think we'll essentially see models that are more cost-performant via distillation, quantization and the multitude of tricks you can do to reduce the inference overhead.