Hacker News new | ask | show | jobs
by opportune 2575 days ago
I work with user data as part of my job.

We gladly set up large pipelines and infrastructure to let data flow from users, through message queues, into databases, and from there into analytics workflows. But we balk at the thought of this process being anything but unidirectional, or in implementing exportable logs to track how data is tranferred, combined, or analyzed.

If the way user data propagates through third parties where auditable and visible, it would definitely at least double the work of setting up user analytics. But, other industries make do with similarly powerful regulations. If you can't afford to let users see what you're doing with their data, should you be allowed to collect user-level metrics anyway?

We could also blacklist a limited set of data types, as is effectively done with HIPAA, to better enforce privacy. However, even HIPAA is not restrictive enough, and there is a whole subfield of academia engaged in privacy research which has shown that even HIPAA compliant (in the sense they don't contain certain columns of data) datasets can be used to reveal senstive information using relinkage against public forms of data [0, 1, 2, 3, 4]. But the tech industry is better equipped than any other industry to enforce algorithmic privacy and be good stewards of data. We just don't want to, because it's hard. Building structures up to building safety code is also hard (and, in some ways, too bureaucratic/poorly implemented. Sometimes private companies can actually copyright building code laws[5]), but it's good that we do it, in general.

[0] https://en.wikipedia.org/wiki/K-anonymity

[1] https://en.wikipedia.org/wiki/L-diversity

[2] https://en.wikipedia.org/wiki/T-closeness

[3] https://en.wikipedia.org/wiki/Differential_privacy famously used by Apple

[4] https://en.wikipedia.org/wiki/De-identification

[5] https://techcrunch.com/2019/04/09/can-the-law-be-copyrighted...

P.S. : I do think HIPAA, GDPR, etc. have their flaws. But that just means we should try to do better, rather than just blindly oppose any attempt to do better. The vast majority of privacy gains can be accomplished with the simplest changes: anonymization, pseudonymization, limits on time/spatial granularity, etc.