| HN Mirror

Really good point fitblipper - I see this all the time.

Organizations have poor/no data retention policies so they accumulate information they no longer need for many years and on the other side they continue to build new data processing capabilities that might leverage that old data which lacks any context as to what it was gathered for or how it can (or cannot) be used.

On the point of what degree of identification can be learned (i.e. how personal is the information), I'm constantly reminding by folks in the ML field that you can discern a huge amount about an individual quite precisely from what might initially look like anonymized data - that's one of the exploits that most concern privacy specialists when they talk about differential privacy and pseudonymization - arguably we're not where we need to be yet but thankfully there are a number of teams working on solutions to this.

By this I mean the data analysis part; the retention and policy issue, that's still one most companies need to do better on.