Hacker News new | ask | show | jobs
by willvarfar 2340 days ago
It isn’t in the interests of word processors to round-trip through pdf. If you look at the PDFs the mainstream word processors generate, you see some of them actively trying to stop text extraction. It’s like an obfuscation arms race. They include white-on-white text, and jump all over the page positioning text so no whole words occur in the source etc. Sad but true.