Hacker News new | ask | show | jobs
by pratheekrebala 2729 days ago
FOIA officers will still find a way to send me scanned PDFs of spreadsheets.
2 comments

This is the worst. Thankfully, there are tools like Tabula to extract the data.
I’ll defend this practice. It’s the only way of knowing for sure that you’re transmitting exactly the information you intend to send. Even copy/paste often picks up other stuff you don’t intend.
It's more of a way to prevent transmitting any easily accessible data at all. Using a human-auditable but still machine-readable format like CSV is what should be done.
It the only, but maybe the easiest.

Having a data review process with automated integrity, confidentiality, and quality checks is not terribly difficult.

But having a prototocol to export the pdf to csv is also dead easy for confirming only the data relevant is included. ASCII is just as “easy” as scan, but it requires training clerks to be data-oriented rather than document.

_ugh_ if only CSVs were standardized sooner and more completely. There are many encoding, delimiter, escaping and truncation conventions to deal with in real world data.
Definitely. They are better than PDFs, but still have lots of room for improvement.
There are other ways to ensure this. Even with your own logic, it would make sense then to send both the Excel sheets and scanned PDFs of the Excel sheets, isn't it? It would be super comical though
PDFs can and do send non visible data that wasn’t intended to be transmitted.
But as the public is typically entitled to that "other stuff" information as well, you're just obstructing.