Hacker News new | ask | show | jobs
by sunami-ai 466 days ago
Making Transformers the same cost as CNN's (which are used in character-level ocr, as opposed to image-patch-level) is a good thing. The problem with CNN based character-level OCR is not the recognition models but the detection models. In a former life, I found a way to increase detection accuracy, and, therefore, overall OCR accuracy, and used that as an enhancement on top of Amazon and Google OCR. It worked really well. But the transformer approach is more powerful and if it can be done for $1 per 1000 pages, that is a game changer, IMO, at least of incumbents offering traditional character-level OCR.
1 comments

It certainly isn't the same cost if expressed as a non-subsidized $$$ one needs for the Transformers compute aka infra.

CNNs trained specifically for OCR can run in real time on as small compute as a mobile device is.

A bit of a tangent, but aren’t CNNs still dominating over ViTs among computer vision competition winners?
I haven't watched that space very closely but IMO ViTs have a great potential to extract from since in comparison to CNNs they allow the model to learn and understand complex relations in the data. Where this matters, I expect it to matter a lot. OCR I think is not the greatest such example - while it matters to understand the surrounding context, I think it's not that critical for performance.