|
|
|
|
|
by fzysingularity
480 days ago
|
|
That's a tough one to answer right now, but to be perfectly honest, we're off by 2-3 orders of magnitude in terms of chars/W. That said, VLMs are extremely powerful visual learners with LLM-like reasoning capabilities making them more versatile than OCR for practically all imaging domains. In a matter of a few years, I think we'll essentially see models that are more cost-performant via distillation, quantization and the multitude of tricks you can do to reduce the inference overhead. |
|