Hacker News new | ask | show | jobs
by pmarreck 2406 days ago
I have never felt harmed or threatened by the idea that my anonymized data (health or otherwise) is being used by large organizations... assuming it is anonymized.

As far as I can see, there are many good uses of this data (some potentially profitable, such as selling to health insurance companies so they can better price their products and evaluate risks) and very few bad uses of this data.

Can someone please clarify for me exactly what the potential harm is here... using evidence and reason instead of conjecture and belief? Because until then, this all smells an awful lot like a conspiracy theory https://www.logicallyfallacious.com/tools/lp/Bo/LogicalFalla...

Here's an example: Google has had our data for literally decades now. What is the measurable, significant harm that has resulted? And if there is nothing, what catastrophes are yet possible where a single or group of rogue bad actors profit off the suffering of many and get away with it?

Please explain to me my naiveté here.

10 comments

Because it is -never - going to be anonymized. They are going to call it anonymized, and then publicly apologize for it not being so a couple of times, while keeping business as usual. There is just no incentive for them to anonymize it.
More to the point, I'm not even sure that, in principle, it is possible to truly anonymize this data without spoiling the data science easter egg hunt. And the easter egg hunt is arguably the entire point of this kind of data collection.

You can do something like k-anonymizing the data and then destroying the original, personally identifiable data. But k-anonymity has its limits, too.

Every other strategy I know of assumes that it's OK to keep a private copy of the original data, which works well if we're talking about scenario such as a source that needs to keep the raw data (like a health care provider) providing the data with a semi-trustworthy external party such as a health researcher. But it doesn't address what I'm guessing is the main concern here, which is that, even if you accept for the sake of argument that Google currently has no intention to do gross things with the data, they can't make any promises that will hold indefinitely. It's a long-lived organization that whose policies might change with any change in leadership, market, or even political conditions, so any promises they might make are simply meaningless in the long run. As they would be with any organization, regardless of the presence or absence of any present-day warm fuzzy feelings.

Basically the data, to be useful, has to be capable of uncovering correlations with all sorts of demographics. Those clues can de-anonymize it. If we take your name out of a data frame (so that we can call it "anonymous") but leave all sorts of other properties (those being the payload useful to data science), you may be nevertheless identifiable from that combination of properties, together with other info known about you from other tracking sources.
Yeah, I totally believe it can be anonymized, but when they attach geo data, age data, etc you can be picked out of the crowd with just a bit of analysis. Plus I don't think most of them actually anonymize it. They just say they do. There's no one auditing them until a court order happens.
Every organization and project I've worked for, for companies much smaller than Google, has done its best to comply with eliminating PII.

https://www.gsa.gov/reference/gsa-privacy-program/rules-and-...

Just because YOU haven't felt harmed or threatened doesn't mean that others feel that way. I generally agree with you but with some caveats. For example, users should have the ability to opt-in/out of data collection. Additionally there are some actors I would choose NOT to share this data with. Google is one of them.
There's a difference between actually harm and perceived harm. Lots of people feel harmed and threatened by vaccines. I'm not saying you shouldn't have privacy but I claiming you "feel harmed and threatened" isn't really an answer to "what's the harm"
Yeah, I don't understand this either. "I feel I was harmed." OK, where's your empirical reasoning to come to this conclusion?
> assuming it is anonymized.

In the age of Big Data, there's only one way for data to be anonymized -- it needs to be aggregated with all the other data, and the original individual data records need to be deleted.

And sadly that's never going to happen because those individual data records are valuable.
If you research this via sprinkling of curiosity allocated time over the course of a year..you will not care. Because it isn't about what has already gone wrong. It isn't even about harm or risk. In the outside its about this information is me. It is mine. It is easy to disregard my dignity) autonomy/my precious private personal traits, preferences etc.

This, to me anyway, is lifeblood American identity stuff.

You might say "well congrats on your private liberty but youre sharing it right here for all companies to scoop up". But that's exactly that problem.

But look at how Google approaches this topic (were you even aware they had this data?) compared to Apple who advertises very clearly that they will use it in studies, etc. It's not only what you do but how you do it also. Transparency can go a long way.
Because this data exists, people will rely on it being accurate.

https://www.nytimes.com/2018/10/03/us/fitbit-murder-arrest.h...

> Please explain to me my naiveté here.

There is nearly zero incentive to actually your anonymize data, and anonymization doesn't make you anonymous.

This is a lesson we should have learned more than a decade ago[1], when AOL released their anonymized search data for research purposes, and thousands of people were trivially identified using it.

[1] https://en.wikipedia.org/wiki/AOL_search_data_leak

> I have never felt harmed or threatened by the idea that my anonymized data (health or otherwise) is being used

How anonymised is it again?

Because anonymous percentages can be interpreted as the likelihood that you yourself are those things, like a sort of quantum mechanical cat paradox. I believe it's also how our "collective consciousness" and language kind of works (you're outputting what you hope they'll be inputting based on what people typically will input). Basically, if anonymous stats show 99.9999% of people watch child porn, I can now accuse you beyond a reasonable doubt (roughly) of watching child porn. Or like if it become known that 80% of people were trans, I might stop going on Tinder. While I don't know _you_, given a bunch of data, I know "people like you", "your people", or whatever other phrase that half of people find offensive and the other half find accurate. Black people be black, almost by definition. Jews are Jews. I'm now seen as a racist white male KKK incel Trump supporting basement dwelling... Wait, we're getting off topic. The point is, society retardation aside, data will only ever be used to judge you to the best of its ability, and rarely in a way that will let you bang more hot women, but more frequently in a way that raises the prices of what you want or need or places you in jail. Cops will park where people "tend" to speed. They didn't know _you_ would be speeding, but ... they did. You are betrayed by your peers to evil forces.

If the people were given anonymous data that showed that 100% of the 2008 bankers were going on a cruise departing tomorrow, we could easily fix things. Well, that's what _they're_ doing to us.

You actually forgot my favorite one on this topic:

https://www.martinfowler.com/articles/bothersome-privacy.htm...

You're arguing something different. I'm arguing that a sufficiently anonymized version of my data is not demonstrably harmful. You're arguing that privacy in general is important, which I would not dispute.