Hacker News new | ask | show | jobs
PDF Hell: Why is extracting data still a nightmare? (unstract.com)
3 points by naren87 122 days ago
2 comments

I found that Claude Sonnet 4.6 solves all of this very easily

No workflows and 0 setup, you just give it the PDF with photos of text and it spits out a perfect `docx` (not just in english)

> Text in a PDF file is ... lacks any logical or semantic structure.

Checks "PDF".

Checks "lack".

Hmm.