| I agree it's possible that the name-to-gender mapping was done before the full ride data was handed over to this analyst. (Though just removing real names would still leave a lot to be desired in the anonymizing process). However there's no mention in these posts of such safeguards, and subjectively the post reads more like the analyst is just fishing around in the full raw dataset of ride times, start and end locations, and names. To wit: "What else can we learn? First, we can devise a way to statistically assess whether there are more women or men in a neighborhood than we’d expect. [...] We used Rapleaf’s Name to Gender API to assess the likelihood of a rider’s gender given their name, only accepting a match if the probability was >= 95%." And in the original post, he categorizes rides as possibly related to a late-night hookup based on whether the destination and departure points for 2 rides are within 0.1 mi of each other. >Internal metrics teams nearly always have access to complete data. The issue is sharing non-anonymized data externally. I disagree pretty strongly with this. Do you think that your average Uber rider would be OK with Uber employees analyzing their ride patterns (with their real names attached) to try to figure out where and when they are having sex? Do you think Uber should allow such access to its employees by policy? (It seems we agree that writing a blog post about it is not a great idea.) |
I don't see how this is any different from Google analyzing search data to try and figure I'm pregnant. You could make the argument that "its a algorithm" but at one point someone had to sit down and build that model.