Hacker News new | ask | show | jobs
by jamesrom 3450 days ago
How does Uber anonymize the data? Does it only use subset of a trip? If you can see where and when a trip started or finished, it's definitely not anonymous enough.
5 comments

From the FAQ: "All data is anonymized and aggregated to ensure no personally identifiable information or user behavior can be surfaced through the Movement tool"

Aggregation is a common method for anonymization. One approach is to only display trips that were made by at least 15 different people in a day.

Watch the video again--geographic aggregation. Summary stats from geo-to-geo.

EDIT: I think in the video they showed census tracts, which is one of many geographic units they could choose from.

In their trial with Boston, it was limited to zip codes[1]. Cross my finger for census tracts as that would be far more useful.

[1] https://www.boston.com/news/business/2016/06/16/bostons-uber...

Yeah one scenario I was thinking of was the case of a rural home that's isolated from other buildings. It's plainly obvious that any data coming to and from that building is the occupants. There probably has to be some threshold of users (say, > 100 users taking a route) before it's exposed through this service.
Maybe they could randomize the pickup / dropoff points within a small radius, say 1/4 mile.
Certainly, this data can be identifiable.