|
|
|
|
|
by Oras
565 days ago
|
|
Thank you, this is a mix of OCR and LLM, I was thinking if there might be a library to avoid using that. A better approach will be using Textract as it maintains the flow, such as if you have a table going across multiple pages. Btw, tesseract is not that good in getting accurate data from tables. Use it with caution especially in financial context. I have made an open source tool to show missing data from tesseract and easy ocr
https://github.com/orasik/parsevision/ |
|