Hacker News new | ask | show | jobs
by mmcclure 4224 days ago
I gotta say, I'm not really seeing the creepy / cringey / evil / whatever-else here...

Anyone (especially the HN crowd) should know they have the data, and if you think they're not carefully analyzing it behind the scenes (like every other tech company who has your data), I've got things to sell you. I personally think a tiny peek like this into the data, much like the usage posts that OKCupid, YouPorn, and others give, is neat.

11 comments

The problem here (for me personally, at least) is that Uber is not in the business of selling dates/"encounters" and that people don't expect a ridesharing company to go right for the sexual data. Even OKCupid is straddling the line here with http://blog.okcupid.com/index.php/we-experiment-on-human-bei... noting that:

  To test this, we took pairs of bad matches (actual 30% match) and told them they were exceptionally good for each other (displaying a 90% match.)
That's really not something people like having done to them. And the "HN crowd" shouldn't have an expectation of privacy and decency in data? Of course they're analyzing data, but it's really the viewpoint from which they do it that is unsettling. OKCupid says "no, duh, we're unethical. Deal with it." Uber says "Check it out! We drew a line between social security checks and prostitution!" (as waterlesscloud notes at https://news.ycombinator.com/item?id=8644138 )

There are a million more beneficial ways that people could be using the data. Fighting hunger, poverty, illiteracy, etc., to me, is a "good" use of Big Data. Looking at sexual habits (when you're not selling sex) or openly manipulating people to get data is, to me, a "bad" use.

The idea that statistics have a moral imperative to be "decent" is as fascinating as it is ridiculous. Anonymized data is not a privacy breach, and Uber probably doesn't have any data that can help with "hunger, poverty, illiteracy, etc.".

I'm sorry if the idea that "people's short overnight stays are evident in their travel data" makes you blush, but that isn't anyone else's problem.

The whole point is that it's not anonymized and it's being inspected for and used for purposes that have little to do with the ostensible customer-service agreement.

People aren't used to transit companies interrogating them about the purposes of their journeys, they just want the transit company to get them from point A to point B (imagine if they did this when you got in the car: "Where are you going? Why?")

And obviously from a business perspective the more you understand your customers and their motivations the better you can serve them.

But lets not kid ourselves. This isn't anonymized data. Uber's publishing in a format that is unspecific, but they have all of the detailed data and can poke through it and infer things at their leisure, and they have no compunction around how they're doing it or why.

This is why ethics and trust around data collectors is really important. Uber seems pretty cavalier about it, and that actually is a problem.

> But lets not kid ourselves. This isn't anonymized data. Uber's publishing in a format that is unspecific, but they have all of the detailed data and can poke through it and infer things at their leisure, and they have no compunction around how they're doing it or why.

That's a fairly large accusation to make.

This blog post was originally published in 2012 - two years ago. Since then has anything come out that would confirm your suspicions? I haven't seen anything.

Sure, i don't normally like linking to TC but this has a pretty good roundup of links: http://techcrunch.com/2014/11/20/following-pressure-from-u-s...
Well shut my mouth, thanks for the link.

..even if it's from TC (I won't hold it against you).

I included the word "decent" because the way they used data goes beyond people's "overnight stays," they previously analyzed the spending patterns of people and tied it to welfare checks and prostitution. They immediately call it "one of the coolest things about working for a data-driven company like Uber" afterwards. It's bad data science, not only because it's only a correlation and not an experiment so nothing can be proven, but because they use these unproven claims to say outlandish and unethical things.

I meant "decent" in an ethical sense, not in a conservative "don't you look at my 'short overnight stays'" sense.

>It's bad data science, not only because it's only a correlation and not an experiment so nothing can be proven, but because they use these unproven claims to say outlandish and unethical things.

I don't disagree that they've not scaled any sort of pinnacle in data science, but neither do I think what they're reporting is uninteresting.

In what way is what they're saying outlandish and unethical?

A little off-topic, but I don't see why OKCupid's actions here are unethical. Their matching algorithm isn't perfect, so they shouldn't treat it as an oracle of truth. How else would they discover false negatives in their algorithm? Especially since, in this case, a false negative is worse than a false positive (not meeting someone you'll like vs having one unsuccessful date).
> How else would they discover false negatives in their algorithm?

This is exactly why research that deals with humans at Universities invariably must pass a human subjects review process. "How else would we discover X?" is certainly not reason to subject anyone to an unethical experiment. Subjecting people to what you likely believe to be a bad date should very definitely raise red flags, even if the details in practice would pass a human subjects review.

And that's the trouble: there's a tremendous space of research that just isn't ethical to carry out on actual living humans. As such, we have to find methods to determine answers to those questions that don't breach ethical standards. The burdens of discovery must lie squarely on the researchers, not on the (often unwitting) experimental subjects.

Do you think that giving someone an artificially inflated OKCupid match really rises to the standard of an unethical experiment though? OKCupid doesn't tell you who to go on a date with; they just suggest potentially good matches. (Right? I'm married and don't tend to troll dating sites, but that's my understanding.) You're free to read their profile, exchange messages, etc., before arranging a date. If it is indeed a bad match, then most likely you would realize your incompatibility early in the process.
People need to at least understand what's being done and they need to give consent before it happens. Otherwise, you're literally toying with people's lives. And in this case it's not in some insignificant way: you're manipulating their romantic and sexual endeavors.

It's actually far, far more invasive than what Uber did as they described it in the blog post.

Have you read the terms and conditions of your latest bank account? The level of forced-consent to thrid party disclosure may alarm you.
That's a completely different category of life violation though. Imagine instead that your bank was lying to you about your account balance, modifying it to be plus or minus 3% of the actual balance. Without your consent or knowledge. All to conduct a "psychology/market experiment".

Then it would be equivalent.

Nothing in this post involved distortion of customer data. They just linked up transcation time/date and geo-location data. Then did some simple math. It's not out of the question that your payment processor could replicate this analysis...Once your credit card processor cuts a deal to geo-tag your purchase history. Of course almost all fixed POS hardware is geomapped, and the mobile stuff is trackavle, so that's not much of a stretch.
Don't they sell people on their super-accurate-awesomesauce-state-of-the-art matching algorithm? Were people warned that they may be guinea pigs?
It was unethical because they didn't warn their users ahead of time that they might randomly be opted into the alternate pairing system.
The uber post almost certainly did not violate anyone's privacy. They ran a bunch of aggregate queries that probably dropped any pii pretty early on. They did not publish a list of riders who took a ride of glory.

(I say they "probably dropped PII" because when you do work of this sort, PII is boring data that slows down your calculations.)

Similarly, what's wrong with observing a correlation between welfare checks and prostitution? It's an interesting observation. It's potentially useful for public policy and fighting poverty (at least American style relative poverty), though of course a more detailed investigation needs to be done.

There's a difference between this type of post and a post by OkCupid. OkCupid is a dating platform and their blog posts are net-positives for their users. What should I say in my first opening message? What do I wear in a picture to attract a mate?

By contrast, it's simply not professional and reeks of juvenile behavior for Uber to be writing a post like this. Just because you have data and have these thoughts, doesn't mean you have to do the analysis and show the world. It doesn't help their users, it's not even that interesting, and it's not relevant to their value proposition as a business.

Feels to me like someone saw the success of OKCupid and their content marketing strategy and tried to shoehorn in something similar with whatever data they had with less than stellar results.
What you've described is "poor judgment."
Yes. And when considered in light of Hourdajian's statements about privacy and Uber's data-policies, it is rightly termed as "questionable" in the PDF that references this article.
I think the problem is context. Had uber been a really fun company, we would laugh at insights into our very being.

But since they are accused of trying to dig up dirt on people, this is a chilling reminder that they are more than capable of doing that, and apparently quite willing.

But this was from 2012, when Uber was a "fun" company. They were doing on-demand valentines and mariachi bands. It's in line with a small startup (which they were), and doesn't present any sensitive information.
Sorry, I didn't notice the date, I just saw it on HN the other day and thought "What were they thinking?"

Makes sense to remove it now, since it isn't something they want to highlight in light of recent events.

A limousine service that uses business records to work out when passengers are fucking and then writes articles about doing this, complete with fucking graphs and even some fucking maps, I think qualifies as pretty fucking creepy.
> Anyone (especially the HN crowd) should know they have the data, and if you think they're not carefully analyzing it behind the scenes (like every other tech company who has your data), I've got things to sell you.

That's the creepy bit. Who owns that data? I want to live in a world where I own my data, and it can't be used for creepy purposes like this, or to extract additional value through arbitrage based on asymmetric information availability.

Well In this digital age, we do not own the data even if we are generating the data. So I would not go to extreme telling companies not to use my data for any purpose but I would definitely like some assurances from them not to use for such creepy and unnecessary means. Time and time again, we are seeing companies abusing our data be it Facebook for manipulating news feed for experimentation or Uber for such nonsensical studies.
You would have much more control over your personal data in the EU. Companies are required to share it with you on request and are subject to limits on how long they can retain it.
Yeah, the blog post seems fine minus the last sentence. And if they'd simply removed the last sentence, I doubt anyone would have noticed.

The Streisand effect is so well-known that I'm surprised anyone would delete a blog post nowadays.

EDIT: I actually hadn't read the blog post in detail until now, which was more than a little dumb. I thought it was just an analysis of rides along with some neat heatmap images. I didn't realize it was about sexual datapoints.

So well known, much like the cycle of violence...

And yet.

Guys aren't really hitting it on the head. It's about the level of behavior that companies release analysis on.

See, there's another company that occasionally releases interesting data analytics: Google.

See: Word frequency over time, Predicting the spread of viruses from searches, etc.

The issue is that Uber is trying to explain motive and behavior at the individual level ("I know something about you!"). This is something that would be a definite no-no for Google. The cheekiness of the language certainly doesn't help either.

Difference is OkCupid does it with class. Uber doesn't.

The more and more I hear about this company the more I am thankful we have heavily regulated taxis/cabs.

And yet, there exists a huge database of precise pickup/dropoff points and times for NYC yellow taxicabs that someone obtained using a Freedom of Information Act. http://www.andresmh.com/nyctaxitrips/

I think I'd rather my data only be available to a private company and their handful of engineers than the whole world.

...which Uber will, potentially, drive out of business because their service is cheaper, due to lack of regulation. :(
I agree that the article, much like the similar OKCupid ones, is pretty interesting.

(though it's not very well-written, some analysis a bit iffy, and the guesswork towards the peaks and dips in the graph rather low-effort)

Creepy/evil maybe no, because the data is clearly anonymised. However the cringe is all over this article. OKCupid's stuff could easily be just as cringey, but they know it's important to steer clear from that. Also they're a dating site, if they wrote an article about data-mining one-night stands, that would make sense. Not so much for a taxi company, especially not in light of Uber's general attitude.

The final sentence of the article definitely crossed from "cringe" into "creepy" for me, though. In particular from someone called "Uber".

The PDF in which this article was referenced did so to illustrate the availability of this data for frivolous purposes, and is right to call it "questionable" when considered in light of Hourdajian's statements about privacy and Uber's data-policies.

Yeah between this and the prostitution post I'm not seeing it either. To me it just looks like some interesting trends that occur in Uber's dataset.
It's not about who has what data. It's about what's being done with it.