|
|
|
|
|
by alaanor
125 days ago
|
|
There was so many OCR models released in the past few months, all VLM models and yet none of them handle Korean well. Every time I try with a random screenshot (not a A4 document) they just fail at a "simple" task. And funnily enough Qwen3 8B VL is the best model that usually get it right (although I couldn't get the bbox quite well). Even more funny, whatever is running on an iphone locally on cpu is insanely good, same with google's OCR api. I don't know why we don't get more of the traditional OCR stuff. Paddlepaddle v5 is the closest I could find. At this point, I feel like I might be doing something wrong with those VLMs. |
|
I've thought of open sourcing the wrapper but havent gotten around to it yet. I bet claude code can build a functioning prototype if you just point it to "screen_ai" dir under chrome's user data.