Hacker News new | ask | show | jobs
by junhoyeo 968 days ago
Yup! But I'm still exploring options. (any recommendations would be welcomed!) Here are some candidates I'm considering:

- https://github.com/mindee/doctr

- https://github.com/open-mmlab/mmocr

- https://github.com/PaddlePaddle/PaddleOCR (honestly I don't know Mandarin so I'm a bit stuck)

- https://github.com/clovaai/donut -- While it's primarily an "OCR-free document understanding transformer," I think it's worth experimenting with. Think I can sort this out by letting the LLM reason through it multiple times (although this will impact performance)

- yesterday got a suggestion to consider https://github.com/kakaobrain/pororo -- don't think development is still active but the results are pretty great on Korean text