| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mikemotherwell 2159 days ago

>In older systems

If only it were older systems!

I get data from a variety of undocumented sources, and I started logging what I actually receive next to the normalised data, e.g. I either put raw JSON into the database next to the actual data it contains, or I put all downloaded data into a git repo and update the repo after every query.

Why raw JSON? I've had data come back as JSON that did not validate like not escaping quotes, blank fields that were missing a value e.g. {"dog": , "cat": "yes"}, just plain wrong data (a number field that was "no number at present"). Putting it into the DB lets me fix the errors and reprocess.

With non-JSON, I've had a raft of other errors/problems, and I find git works well here to show what changed and caused the errors.

When dealing with anyone else's data, not just validating but also logging in a structured way that makes discovering errors easier is vital.