Hacker News new | ask | show | jobs
by vforgione 3111 days ago
Right now there isn't any data being reported by our test nodes about traffic (people or vehicles) near a given node, but some ideas have been floated regarding how to get a read on pedestrian and vehicle density. We're not sure how we would do it, but I'd love to get some input on what seems both effective and ethical.
1 comments

The biggest concern is that it's really easy to de-anonymize location data, even if all you have is numbers moving around. Some locations are very unique and identifying (e.g. homes and offices), and other locations are very sensitive. Better to avoid it altogether, if you can.

Also note that bucketing is not sufficient if the lowest bucket is just 0 and the next one up starts at 1, because people walking alone will then easily show up as they go past a series of sensors. There will be a temptation to separate out zero, because it'll seem important to distinguish between "no traffic" and "small amount of traffic". Resist that temptation.

If there is any other information reported that can be correlated, it'll be easy to use that for further identification. For instance, pressure or vibration information, noise-level information, or many other sensors would easily be dual-purposed as a presence-detect.

Beyond that, please read up on all the ways that certain sensor data can have side-channels; for instance, sufficiently high-resolution accelerometer data can be turned into audio with enough fidelity to recognize speech.

This is excellent! I appreciate how much detail you’ve put into this.

Could you recommend any specific sources I or anyone else on the team could read about how best to work with and protect this kind of data?

I can recommend two classes of resources.

First, take a look at material on the "security mindset", starting with https://www.schneier.com/blog/archives/2008/03/the_security_... . Everyone on your team needs to be thinking, for every new feature, "how could this be exploited?".

Second, for the specific case of sensors, start looking at research on sensor side-channel attacks, and how sensors can be used to gather information you wouldn't expect. For instance, see "Sensor Side-Channel Implications on User Privacy: Analysis and Mitigation". And take a look at some of the sensor-related work coming out of the various workshops on "Cyber-Physical Systems Security".

Finally, please keep in mind that it's still risky to have these sensor nodes out there that even have the capability of doing this collection. Even if you keep all of the above in mind, even if you do everything you can to mitigate it, the capability will still exist, and all it would take is some malicious policy changes to abuse your work and your infrastructure, and turn it into a massive invasion of privacy. With that in mind, start now, while policies are in your favor, arranging maximum transparency for the nodes, source code, data collection, and similar. That way, if anyone ever does try to abuse your work and your infrastructure, it'll be extremely obvious, and if anyone tries to remove the transparency first, then it'll be conspicuous by its newfound absence. That same "security mindset" I mentioned above also applies to policies and administrations; take the time, while those policies and administrations are in your favor, to plan ahead for the scenario where they are not. Plan ahead for something you hope you never need, because once you find out you do need it, you might not have the option of building it anymore.