| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Johnbot 58 days ago
	A lot of geolocation data on the market is anonymized, following medium-lived unique IDs that aren't able to be mapped to other identifiers. The problem with that is that if you have precise locations, or enough samples that you can apply statistics to find precise locations, in many cases you can de-anonymize the IDs. You can purchase address and resident listings from a number of different data vendors, and by checking where the device returns to at night you can figure its home address. Then if you find information on the residents (work locations, schools, etc.), you see if said device goes where each resident of the home address is likely to go, and you now have a pretty good idea of exactly who the device belongs to.

10 comments

rockskon 58 days ago

There is no such thing as anonymized location data when you have the location of something where and when they sleep and work.

It's a rhetorical fiction the ad industry tells itself.

link

Terr_ 58 days ago

Right, there's probably no other phone in the world that typically stops for hours within 1000 feet of my bed and typically stops on Monday-Friday within 1000 feet of my work-desk.

link

mapt 58 days ago

Now think what Lavrenti Beria and an LLM could have done with that.

link

wafflemaker 58 days ago

Somebody once said that if Stalin had access to television, he would never have to kill 20+ million ppl. What would he do with all that data? No idea.

link

kspacewalk2 57 days ago

If all you've got is full political power and control over propaganda networks, your won't get the USSR. You'll get Hungary between 2010 and 2026. It works well, but in the critical moments when things start going wrong you need to kill people to maintain power, or else your nascent autocracy collapses as quick as Orban's.

link

drysine 58 days ago

I'm no fun of Stalin, but this meme about 20+ million victims needs to be purged.

"The scholarly consensus affirms that archival materials declassified in 1991 contain irrefutable data far superior to sources used prior to 1991, such as statements from emigres and other informants.

Before the dissolution of the Soviet Union and the archival revelations, some historians estimated that the numbers killed by Stalin's regime were 20 million or higher. After the Soviet Union dissolved, evidence from the Soviet archives was declassified and researchers were allowed to study it. This contained official records of 799,455 executions (1921–1953), around 1.5 to 1.7 million deaths in the Gulag, some 390,000[ deaths during the dekulakization forced resettlement, and up to 400,000 deaths of persons deported during the 1940s, with a total of about 3.3 million officially recorded victims in these categories. According to historian Stephen Wheatcroft, approximately 1 million of these deaths were "purposive" while the rest happened through neglect and irresponsibility. The deaths of at least 5.5 to 6.5 million persons in the Soviet famine of 1932–1933 are sometimes included with the victims of the Stalin era." [0]

https://en.wikipedia.org/wiki/Excess_mortality_under_Joseph_...

link

nacozarina 57 days ago

lol naturally the criminals were obsessed with honestly keeping comprehensive official records of their misdeeds

link

maximinus_thrax 57 days ago

> I'm no fun of Stalin

I would argue for the generality of this characterization

link

kevin_thibedeau 58 days ago

Only thing better to rule with is a network connected telescreen that monitors and issues orders to the proles.

link

nlitened 58 days ago

So Instagram and TikTok?

link

breppp 58 days ago

Pretty sure it would be hard to enslave these people through television

link

hightrix 58 days ago

Would it be? I'd argue the current US administration is entirely propped up by television. Hell, the president seems to "rule" based on what Fox News said last night.

link

xigoi 57 days ago

I’m pretty sure most phones have a higher location accuracy than 1000 feet.

link

Forgeties79 58 days ago

And with LLM’s now it’s easier than ever to piece the parts together. Companies were doing it before we even knew what LLM’s were capable of.

Edit: It's a rhetorical fiction the ad industry tells us.

link

abustamam 58 days ago

I think this begs the question of what anonymous data means. Sure my visit to HN is "anonymous" in that it doesn't say "abustamam visited this site" but piece together all the other visits that have my "anonymous ID" then eventually it paints a pretty nice picture of who I am.

link

rockskon 58 days ago

Does it map to a single, identifiable person or something close enough that the distinction is meaningless?

Then it's not anonymous.

Simple as that.

link

abustamam 58 days ago

My point is that even completely anonymous data that conforms to what you just said can easily become de-anonymized when contextualized to other "anonymous" data.

link

rockskon 58 days ago

A marketer's definition of anonymized is worthless. It's a fantasy they want everyone else to believe in.

If it can be "de-anonymized" then it was never anonymous to begin with.

"De-anonymized" is quite literally an oxymoron.

link

abustamam 57 days ago

> A marketer's definition of anonymized is worthless. It's a fantasy they want everyone else to believe in.

I'm using your definition.

> Does it map to a single, identifiable person or something close enough that the distinction is meaningless?

Also

> If it can be "de-anonymized" then it was never anonymous to begin with.

Well sure, that's the point I was trying to make in my rhetorical question above. Individual pieces of data may be "anonymous" but put together with other anonymous data that can be traced to a single source and suddenly you can figure out quite easily who this person is. The data itself is still technically anonymous but it can be pieced together.

link

thfuran 57 days ago

Does that mean that no non-post-quantum encryption was ever actually encryption because in 20 years someone will be able to decrypt things?

link

teraflop 58 days ago

We should have learned this lesson 20 years ago when researchers were able to deanonymize a lot of the Netflix Prize dataset, which contained nothing except movie ratings and their associated dates.

https://arxiv.org/abs/cs/0610105

If movie ratings are vulnerable to pattern-matching from noisy external sources, then it should be obvious that location data is enormously more vulnerable.

link

totetsu 58 days ago

> In contrast to previous attacks on micro-data privacy [22], our de-anonymization algorithm does not assume that the attributes are divided a priori into quasi-identifiers and sensitive attributes. Examples include anonymized transaction records (if the adversary knows a few of the individual's purchases, can he learn all of her purchases?), recommendation and rating services (if the adversary knows a few movies that the individual watched, can he learn all movies she watched?), Web browsing and search histories (12], and so on. In such datasets, it is impossible to tell in advance which attributes might be available to the adversary;

Is Location data highly dimensional though?

link

vovanidze 58 days ago

exactly. calling it 'anonymized' is pure security theater once you have enough data points to map out someones daily routine.

waiting for legislation or eulas to fix this is a lost cause since adtech always finds a loophole. the fix has to be architectural. moving toward stateless proxies that strip device identifiers at the edge before they even hit upstream servers. if the payload never touches a persistent db there is literally nothing to de-anonymize. stateless infra is the only sane way forward

link

microtonal 58 days ago

To be honest, I feel like this is where iOS and Android are failing us. Why is every app allowed to embed a bunch of trackers? Only blocking cross-app tracking on user request as iOS does is not enough (and data of different apps/websites can be correlated externally).

link

CPLX 58 days ago

Because we don’t enforce antitrust law in this country and the people that make those decisions profit from the ads.

link

chimeracoder 58 days ago

> To be honest, I feel like this is where iOS and Android are failing us. Why is every app allowed to embed a bunch of trackers? Only blocking cross-app tracking on user request as iOS does is not enough (and data of different apps/websites can be correlated externally).

Even if Google and Apple both want to commit to fighting this, it becomes a game of whack-a-mole, because there are all sorts of different ways to track users that the platforms can't control.

As an easy example: every time you share an Instagram post/video/reel, they generate a unique link that is tracked back to you so they can track your social graph by seeing which users end up viewing that link. (TikTok does the same thing, although they at least make it more obvious by showing that in the UI with "____ shared this video with you").

link

rolph 58 days ago

im not sure about allowed. perhaps required may be closer.

why would someone include tech that makes people think twice about using the app, unless it is required if you want to "sell" in a particular venue.

if your developing geolocation based apps, location tracking is a core function.

a calender, absolutely does not require location tracking beyond what side of the prime meridian are you on.

link

nickburns 58 days ago

> if your developing geolocation based apps, location tracking is a core function.

But the subsequent sale of that data is not—is the discussion here.

link

rolph 58 days ago

and the reason why that data is available for sale, starts with forced collection of data, if you want to participate in an app store as a developer.

you cant sell what you dont have unless you lie lower than a rug.

fix the data collection problem and a second order effect of no data for sale emerges.

link

nickburns 58 days ago

Are you suggesting Android/iOS app developers are forced into data collection somehow? If so, how? I'm genuinely curious.

link

LeifCarrotson 58 days ago

> why would someone include tech that makes people think twice about using the app, unless it is required if you want to "sell" in a particular venue.

Because the overwhelming majority of people don't think twice about this tech.

I do, and that's why I use a lot of web tools or old-fashioned phone calls, but most people think metadata=unimportant and assume that the purpose of the app is what it does for them rather than to gather their personal information for sale.

link

uxhacker 58 days ago

How is this legal under the GPDR? There is clear examples in the citizenlab document of a user been tracked inside of the EU from outside.

Is there not also a requirement for clean consent? Ie a weather app can’t track your precise location?

link

sroussey 58 days ago

Companies exist that de-anonymize other data brokers data. Lets the other data brokers claim they have anonymized data while end end users get everything.

link

ImPostingOnHN 58 days ago

you could probably run a anonymization company at the same time you run a de-anonymization company

link

gessha 58 days ago

Best of both worlds - legal and profitable \s

link

nzach 58 days ago

> enough samples that you can apply statistics to find precise locations, in many cases you can de-anonymize the IDs

I think a lot of people don't realize the power of a big enough sample size. With enough samples even something pretty innocent looking like your daily step counter could make you identifiable.

As far as I know we don't have large enough databases to make this happen in practice, but I don't think this is impossible in the future.

link

jandrewrogers 58 days ago

How large are you estimating is "large enough"?

link

jandrewrogers 58 days ago

Location and identity are inextricably linked. You can't destroy identity without also destroying location and location is critical for myriad purposes.

The analytic reconstruction of identity from location is far more sophisticated than the scenarios people imagine. You don't need to know where they live to figure out who they are. Every human leaves a fingerprint in space-time.

link

nickburns 58 days ago

> and location is critical for myriad purposes.

It's not though.

Critical for myriad elective purposes? Sure.

link

jandrewrogers 58 days ago

Only if you consider the entire concept of logistics in civilization as "elective".

link

xphos 58 days ago

Seems hyperbolic we had logistics that functioned extremely well before we had customer location data for sale on 3rd party sites.

link

philipallstar 58 days ago

If you re-read the comment they didn't say that selling it was intrinsic.

link

xphos 58 days ago

The article is about privacy tracking spyware cookies. I think making statements in that context about how modern logistics don't work with out location data implies you mean location data from those sources. I mean i suppose it doesn't have to but than it just feels off topic no?

link

nickburns 58 days ago

I don't follow what you mean by 'logistics in civilization' as that's pretty vague and amorphous.

Could you be more specific with maybe a single example of where my physical geographic location is electronically critical for a purpose that isn't elective/optional/avoidable?

(And I'm not just trying to be obtuse. I think you're touching on at least part of the 'heart' of both this conversation and that of digital ID verification.)

link

quickthrowman 58 days ago

How does tracking the movements of individual humans aid shipping and logistics, other than providing traffic data to freight companies? How did we manage to have global supply chains prior to GPS being invented?

Edit: I assume I am missing a crucial part of logistics that you’re familiar with, genuinely curious.

link

ninalanyon 58 days ago

In what sense can the latitude and longitude of my house be called anonymous data?

link

kube-system 58 days ago

Ultimately, a map is anonymous data containing lat/lon of everyone's house

Alone, these points are not deanonymizing, it's when there's other data associated.

link

ninalanyon 52 days ago

When such data is sold I'm pretty sure it would be more than just list of coordinates.

link

ramoz 58 days ago

From what I've seen none of this is that complex, one could simply 'draw a circle around your house' and get all the "anonymized" device pings and just trace those.

link

1121redblackgo 58 days ago

Yep. With side channel/one order of thinking above the laws, its trivial to get around said laws. Need better laws.

link

malfist 58 days ago

> A lot of geolocation data on the market is anonymized

A lot isn't good enough.

link