Hacker News new | ask | show | jobs
by mkaszkowiak 685 days ago
For my use case, overall Marker seems to work pretty well - but it has issues with tables. Merged cells, misplaced headers, and so forth. I'm currently extracting Polish PDFs that are //not// scanned

When compared to Azure Document Intelligence, Marker is really cheap when self-hosted (assuming you fall under the license requirements), but it does not produce high quality data. YMMV.

2 comments

Working on improving tables soon (I'm the author of marker)
Glad to hear that :) Thanks for developing Marker!
2nd that. Marker work pretty well as async internal service for us! Thanks!
Yeah, the header stuff (and empty cells) for tables needs some work.