Hacker News new | ask | show | jobs
by acbart 3399 days ago
I had to use Tabula to extract a decade of SAT scores from PDFs for each state/year. It was a nightmare, but I managed it. More recently, I was hoping to do something similar with decennial census data, but it was just too much. Far, far too many groups publish data to PDF, which is about as bad as if they just deleted it straight-out. It's very upsetting.