Hacker News new | ask | show | jobs
by cdevs 3219 days ago
We do a decent amount of work in this, lots of scraping the web and extracting. Unfortunately we dont do it against pdfs that have any strict format or even a loose format at all, I wish it were government forms or any type of forms. Would you say the pdfs you guys are looking at have some type of format and the readers are just hit or miss?