| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by maxerickson 2339 days ago
	Books have about the same amount of semantic information as pdf. It's probably just habit. I think more than lack of agreement, it's just that there aren't really universal document structures. There's relatively useful chunks like paragraphs that are more or less universal (at least for a given language), but those don't need much structure to be clear.

1 comments

willvarfar 2339 days ago

It isn’t in the interests of word processors to round-trip through pdf. If you look at the PDFs the mainstream word processors generate, you see some of them actively trying to stop text extraction. It’s like an obfuscation arms race. They include white-on-white text, and jump all over the page positioning text so no whole words occur in the source etc. Sad but true.

link