Hacker News new | ask | show | jobs
by martincollignon 692 days ago
Have you tried https://github.com/VikParuchuri/marker ?
2 comments

For my use case, overall Marker seems to work pretty well - but it has issues with tables. Merged cells, misplaced headers, and so forth. I'm currently extracting Polish PDFs that are //not// scanned

When compared to Azure Document Intelligence, Marker is really cheap when self-hosted (assuming you fall under the license requirements), but it does not produce high quality data. YMMV.

Working on improving tables soon (I'm the author of marker)
Glad to hear that :) Thanks for developing Marker!
2nd that. Marker work pretty well as async internal service for us! Thanks!
Yeah, the header stuff (and empty cells) for tables needs some work.
Maker worked pretty well for me in my limited testing. They also have a hosted solution:

https://www.datalab.to