Hacker News new | ask | show | jobs
by obscura 1780 days ago
Nice article - I enjoyed reading it. The examples are interesting in that they show how difficult it is to cover all the scenarios one can encounter when trying to anonymise data - e.g., double encoding. However, Avast should have analysed the data, spotted issues like these, and fixed them. Of course, the problem is they didn't have the incentive to do so.

This whole issue is another example of how hard it is for all of us to make good decisions about privacy. Most people wouldn't think about privacy being a problem when using Avast. Even if you do read the privacy policy you actually can't be entirely sure what's being done with your data (which you indicated in your original October 2019 article). However, it appears that you're safe because the data will be anonymised. There's nothing more for you to do at this point other than trust that Avast is handling anonymisation correctly.

I wonder if it's actually possible to anonymise data effectively yet still make it useful. Based on literature such as the academic article you referred to [1] and another I looked at a long time ago [2], it seems to me that with enough seemingly unrelated data you can identify most people.

[1] De-anonymizing Web Browsing Data with Social Networks [2] Robust De-anonymization of Large Sparse Datasets

2 comments

The issue isn’t avast failing to anonymize the data sufficiently for this whole thing to be okay. It is that it is like a locksmith and home security company had access to your home and used that access to sneak in and take photos of your family while it sleeps, then dell them with little black bars over the eyes. Even if they anonymize fit slightly better, the avast leadership should go to jail
It definitely is a hard problem. Clearly, if anybody at Avast bothered to look at the data they would have spotted the issues. But even with real effort, it’s hard to anonymize data reliably while keeping it useful.