Hacker News new | ask | show | jobs
by mpeg 878 days ago
Same here, fitz is great, it does well enough out of the box that I can apply some simple heuristics for things like joining/splitting paragraphs where it makes a mistake and extract drawings and such and get pretty close to 100% accuracy on the output.

The only thing it doesn't do is tables detection (neither does pdfminer.six), but there are plenty of other ways to handle them.