|
|
|
|
|
by JoshTriplett
3113 days ago
|
|
The biggest concern is that it's really easy to de-anonymize location data, even if all you have is numbers moving around. Some locations are very unique and identifying (e.g. homes and offices), and other locations are very sensitive. Better to avoid it altogether, if you can. Also note that bucketing is not sufficient if the lowest bucket is just 0 and the next one up starts at 1, because people walking alone will then easily show up as they go past a series of sensors. There will be a temptation to separate out zero, because it'll seem important to distinguish between "no traffic" and "small amount of traffic". Resist that temptation. If there is any other information reported that can be correlated, it'll be easy to use that for further identification. For instance, pressure or vibration information, noise-level information, or many other sensors would easily be dual-purposed as a presence-detect. Beyond that, please read up on all the ways that certain sensor data can have side-channels; for instance, sufficiently high-resolution accelerometer data can be turned into audio with enough fidelity to recognize speech. |
|
Could you recommend any specific sources I or anyone else on the team could read about how best to work with and protect this kind of data?