Hacker News new | ask | show | jobs
by cpursley 752 days ago
Any tips on effectively getting financial data out of PDFs into a RAG system (especially data contained in tables)? And locally, not via proprietary cloud PDF parsing thingy. That's the current nut I'm trying to crack.
2 comments

https://github.com/VikParuchuri/marker is solid, but slow and needs gpu(s) to be practical
You might find my library useful - https://github.com/Filimoa/open-parse