Hacker News new | ask | show | jobs
by jzwinck 4336 days ago
I wondered about this too. At a guess, perhaps they were doing a straight database dump from a production system which had sensitive information as well as public data. They would then run a script to delete the sensitive columns before posting the dump.

This seems likely to have been broken at the design stage: systems should fail safe. The first-order fix might be to check the return value of the sanitizer script and refuse to upload if it failed. But a better solution would be to write a system which makes it much less likely to leak private data. For example by copying only whitelisted columns (so if new sensitive columns are added to the system they are not dumped by default). Or storing sensitive data in separate tables or even a separate database (this will take more work if levels of sensitivity change over time).

I've speculated here about the details to illustrate the point about systems design. Unfortunately, too often the glue code for these sorts of things is written with little or no error checking, so when something is wrong the system just proceeds through unknown or unvalidated states as we see here. It doesn't help that the default language for cron and a lot of "supervisory" jobs tends to be Bash (or Dash) these days, where error checking is turned off by default.

2 comments

Yep, naive bash scripts don't stop on failure and your non-sanitized file will happily get uploaded.

Does anyone know why Mozilla was posting database dumps(sanitized or otherwise) onto public servers?

It's useful to contributors who work on the site development.
Every unattended shell script should start with this:

    set -e