Hacker News new | ask | show | jobs
by fitblipper 1794 days ago
I think a huge chunk of bad privacy outcomes arise through data retention policies and aggregation across many sources.

To operate, a phone company might need to know where you are calling from and who, a doctors office might need to know your medical and contact info, an isp might need to know your ip address, a dating website might need to know your ip address, your chat app might need to know your contact list, your gps might need to know your precise location at this specific time.

Do they need to know them for years though? And once all this info is aggregated, how personal is the information that can be learned?

1 comments

Really good point fitblipper - I see this all the time.

Organizations have poor/no data retention policies so they accumulate information they no longer need for many years and on the other side they continue to build new data processing capabilities that might leverage that old data which lacks any context as to what it was gathered for or how it can (or cannot) be used.

On the point of what degree of identification can be learned (i.e. how personal is the information), I'm constantly reminding by folks in the ML field that you can discern a huge amount about an individual quite precisely from what might initially look like anonymized data - that's one of the exploits that most concern privacy specialists when they talk about differential privacy and pseudonymization - arguably we're not where we need to be yet but thankfully there are a number of teams working on solutions to this.

By this I mean the data analysis part; the retention and policy issue, that's still one most companies need to do better on.