Hacker News new | ask | show | jobs
by hamburglar 2301 days ago
... and is much more difficult to extract text from than PDF, given that it's turing complete (hello halting problem) and doesn't even restrict your output to a particular bounding box.