Privacy is not just relevant to advertising. There are a huge number of research opportunities in the social sciences that could benefit and can help make the world better.
Here's the fallacy: people think if they collect ALL the data, they will have the best results with whatever problem they're trying to solve.
Right now you're probably wondering, yeah, but there's this one problem that wouldn't have been solved if x or y....
But really it's just hoarding behavior. They're trying to collect it all. Statistical significance is reached very quickly and after that point they're doing harm to society.
Show me what data you've collected and would like to share, including metadata, and if you ask nicely then most of the time I'd be more than happy to share it.
Things like emoji usage, page navigations, feature uses, etc. Ideally anonymous; no IP, user agent, etc, just a small byte or two packed properly can go a long way.
The problem is that it's actually quite hard to reliably anonymize data especially once you start to begin combining data sets from multiple places. That's the problem differential privacy is trying to solve in a mathematically rigorous way.
See for example how researchers partially de-anonymized Netflix Prize data by cross-referencing it with IMDB reviews.
It's not possible to reliably anonymize data and still be able to infer something from it, the idea is just too logically broken. Because the whole reason for anonymity is to make sure no information about any individual or a set of individuals of the size decided by those seeking such protection can be inferred from the data by the parties that want to infer something from it. While "differential privacy" assumes that not being able to infer information about relatively small sized sets of individuals decided by the parties who want to infer something from the data is "privacy". It isn't of course, it's a pretty dystopian use of the word privacy. Hence why corporations love this stuff, privacy without privacy is godsent to them.
Each of these could be stored separately without metadata then aggregated no problem. Things marked * could be left out, and some things could be randomized up or down buckets and such.
Right now you're probably wondering, yeah, but there's this one problem that wouldn't have been solved if x or y....
But really it's just hoarding behavior. They're trying to collect it all. Statistical significance is reached very quickly and after that point they're doing harm to society.