PDF Hell: Why is extracting data still a nightmare?

Y	Hacker News new \| ask \| show \| jobs

	PDF Hell: Why is extracting data still a nightmare? (unstract.com)
	3 points by naren87 122 days ago

2 comments

I found that Claude Sonnet 4.6 solves all of this very easily

No workflows and 0 setup, you just give it the PDF with photos of text and it spits out a perfect `docx` (not just in english)

> Text in a PDF file is ... lacks any logical or semantic structure.

Checks "PDF".

Checks "lack".

Hmm.