Hacker News new | ask | show | jobs
by menaerus 470 days ago
It certainly isn't the same cost if expressed as a non-subsidized $$$ one needs for the Transformers compute aka infra.

CNNs trained specifically for OCR can run in real time on as small compute as a mobile device is.

1 comments

A bit of a tangent, but aren’t CNNs still dominating over ViTs among computer vision competition winners?
I haven't watched that space very closely but IMO ViTs have a great potential to extract from since in comparison to CNNs they allow the model to learn and understand complex relations in the data. Where this matters, I expect it to matter a lot. OCR I think is not the greatest such example - while it matters to understand the surrounding context, I think it's not that critical for performance.