Hacker News new | ask | show | jobs
by mabbo 3013 days ago
With one data point, you can't extrapolate much. This is misuse of statistics.

Consider if there was a new lottery and you weren't sure what the odds of winning were. You play it three weeks in a row and the third time you win a million dollars. Conveniently, no one else tries the new lottery yet.

Does it follow then that the odds of winning a million dollars are 1 in 3? Or should you play it a few more times before you declare to all that one in three plays will make one a millionaire?

8 comments

One accident is clearly not one data point. If Uber had driven a billion miles with 0 accidents, we would safely conclude they were safer than human drivers with "0 data points".

Assuming that accidents are independent, we can model this as a Poisson point process. If the accident rate is 1 per 100M miles and Uber has driven 3M miles, the probability of there being zero accidents in that time is P{n=0} = ((λt)^n / n!) * e^(-λt), where λ=1/100M and t=3M. Doing the math, it seems that's 97.04%.

So, yes. It is possible that Uber's accident rate is 1 in 100 million. If so, this incident would fall in that remaining 3%. It's unlikely, but possible.

Gil Pratt, who heads up Toyota's autonomous development initiative mentioned that we would need to drive 8.5 billion autonomous miles to be able to declare with 99% statistical certainty that autonomous vehicles are safer that human driven ones. Of course, as we are witnessing, great pains will be taken with every preventable injurious or fatal collision to ensure that sort of failure never happens again, so by the time we get to the 8 billionth mile the software and hardware will have improved considerably, rendering the early data moot.

One thing we can say about the woman killed the other day by an autonomous Uber, is that unlike the other ~40,000 killed on America's roads over the past year, her's was not in vain.

Every day that we delay the widespread deployment of this technology, it's another 100 or so people dead. Of course, the public is unlikely to see it that way. They see one death as a tragedy, but 40,000 is just a statistic, business as usual, nothing to get excited about.

I don't really understand a whole of similar comments. It's as if Uber and all other autonomous driving pursuer is doing so for the betterment of life for everyone and so their laziness in making sure their tech is good enough can be excused. The car in question failed spectacularly and the response is her death was not in vain?

Uber is doing this for money, so is all the other companies even if there are some potential huge collateral benefit for the human race from that. It's definitely not the goals of the companies regardless of any PR talk. So when you gamble with people's life for money and fame you should go to jail for a long time executives or engineers alike.

The statistics are just being used to sustain corporate greed in my mind and we should not let them. Self driving cars has lots of potential to save life, so are other techs. Does't mean that all sense of responsibility and ethics goes away just because of the potential.

The pharmaceutical industry tries to save lives, for money. They're driven by greed. They also have a long track record of fucking up big time and people dying because of it. Does this mean we should stop giving people medicine? Would letting people get sick and die be preferable than allowing imperfect industries with a profit motive try and save them?
What is currenly being done with self-driving car testing on public roads is basically like a pharmaceutical company mixing a new experimental drug into the dishes of random people at a restaurant, which neither are compensated in any way, nor did they have a chance to decline their involvement in the test.

Such a thing would be entirely unthinkable in the pharmaceutical industry of today. So if this comparison suggests anything, it is to much more strictly regulate self-driving car development and testing!

That's a strawman argument... These driverless car tests are more like early phase clinical trials. Clinical trials are conducted under very tight controls, using participants who have given informed consent. Drugs don't get licensed until they can prove efficacy and safety and are approved by the FDA.

The current procedures for clinical trials are the result of decades of experience, where mistakes (and yes, occasionally shortcuts driven by greed) did result in avoidable deaths of trial participants.

For an example of how the pharma industry deals with this 'safety first' versus 'stifling innovation' dilemma, read this article: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1526936/

The inevitable result of this crash is driverless vehicle testing will get regulated more like drug trials...

That’s not really an apples-to-apples comparison. In the case of the pharmaceutical company, people don’t really have any other choice other than to take experimental medicine for terminal stage diseases.

On the other hand, there are widely used, cheap, and efficient alternative to self-driving cars on the roads today: human drivers, public transport, carpools, etc.

If you want to make analogies, I think the self-driving car accident is more like if an elevator company accidentally crashed an experimental high-speed elevator in a shopping mall. I think it would be grossly negligent on the part of the company to test such an unproven device on the general public, especially if people’s lives are being put in a position of risk which they did not to worry about before.

Maybe her death wasn't in vain, but it was definitely avoidable. If Uber rushes out half-baked driverless cars, fallout from the incidents they're responsible for will cause serious delays to widespread deployment.

Trading lives to save on R&D time would violate professional codes of ethics in literally any other industry.

Her death was in vain. These kind of fundamental scenarios can be practised with dolls or stunt men on closed test tracks not on public roads ...

If Uber needs data they could have driven manually. Obvously their obstacle tracking is bad.

It wouldn't violate professional codes of ethics in a war, and we're taking wartime casualty numbers on our roads everyday.

This is the real trolly problem when it comes to the ethics of developing self driving cars.

Other big players have taken great pains to ensure that it wouldn’t happen in the first place

Uber should be banned from doing AI research on public roads

It is not a misuse of statistics for "one data point" to significantly shift our beliefs. Let's do the math.

Bayesian approach: To make the math really simple, let's assume a discrete prior on Uber's death rate. Say 33% that Uber's cars are much safer than humans (0.1 deaths per 100M miles), 33% that they are equally safe (1 death per 100M miles), and 33% that they are much more dangerous (10 deaths per 100M miles). After observing one death at 3 million miles, your posterior is should update to {safer: 1%, equal: 11%, more dangerous: 88%). This is a substantial shift in confidence.

Math: http://www.wolframalpha.com/input/?i=(1-10%2F100)%5E2*(10%2F...)

Frequentist approach: Let the null hypothesis be that Uber's self-driving cars have the same death rate as humans - 1 death per 100 million miles. The odds of Uber killing someone within 3 million miles is about 3%. Therefore, we can reject the null hypothesis with a p value of 0.03. One positive "data point" is statistically significant.

Statistically, one death after 3 million miles is not proof that Uber's death rate is higher than 1 in 100 million miles. But it is statistically significant, in both a frequentist and Bayesian framework. You have to get really, really unlucky to have a death at 3 million miles if your death rate is 1 per 100 million miles.

Bottom line: This collision isn't proof, but it's strong evidence. (To go along with all the evidence from crash rates, disengagement rates, engineers working at these companies, and the video of the crash itself.)

If you do want to extrapolate from their data, it would be worth looking at the total number of accidents--not just fatalities. Insurance companies say people file an accident claim every 18 years on average. If the average miles someone drives each year is 12,000 miles, this means they get in an accident every 216,000 miles on average. If Uber drove 3 million miles, we should expect them to have been involved in about 14 accidents over those 3 million miles if the cars are on par with humans.

(I'm guestimating on some of those numbers, but the should be somewhere in the ballpark.)

You'd also need to account for disengagements. Some portion of all disengagements were likely to avoid accidents.
Good point. Although I think many of the disengagements are basically the car saying "I don't know what to do safely" and without a driver to take over it would simply pull over and stop.
There’s more than one data point showing that Uber should not be allowed near anything as safety critical as self driving software

They let a car on the road that couldn’t even stop at a red light ffs

https://www.theverge.com/2017/2/25/14737374/uber-self-drivin...

The fact that they were allowed to deploy in Arizona after this is really regrettable

And it’s totally unsurprising that Uber “got the first kill”

It should be possible to create a Bayesian model of the posterior distribution of fatalities at this point. That distribution will be pretty broad, and not Gaussian, so talking about the mean is somewhat meaningless. Nonetheless, you could certainly compare that distribution with the posterior for human-driven cars and draw a conclusion like: “it is Xx% likely that the Uber fatality rate is at least twice that of a human driver.”
One data point might not be enough, but two may be too many for the industry to bear.

It only took two incidents to shut down the Concorde program.

They did not sample three times and win once. They sampled three million times and won once. Hardly "one data point."
I believe “one data point” referred to one winning, not three plays.
The point remains: it isn't fair to decide what the probability is with only one data point on one side- especially for rare events.

Would it have been fair if Uber last week were to declare that they have a 0% probability of pedestrian deaths, since they'd never had one yet?

The goal of these statistics is to predict future outcomes. But with such a small data sample, you cannot fairly predict the future- just as in my lottery example.

What if Uber’s first self-driving car killed a cyclist in its very first mile of operation? Would you find it equally hard to draw conclusions?
It would be fair to approximate it was near zero per however number of miles they drove, just as it is fair to approximate it is one per three million miles today. Think of it the other way, we know with a high degree of certainty the fatalities are not one per mile, for example.
It still falls under "one swallow does not make a summer" - even though a lot of days of winter preceeded that bird sighting.
That is one datapoint - one death.
One data point? So if Uber had driven a billion miles with zero deaths, they'd have zero data?
If you're looking for the rate at which something happens, for the purpose of predicting events in the future, you need to see the event occur many times before you can fairly estimate it's rate. That's what my example means.

I'm not defending Uber here- I'm defending statistics!

With more data, we may discover that Uber cars are 100X worse, not just 25X. Or we may discover they're better. But we don't have the statistical power to make that estimate when we've only seen the event happen once.

Your knowledge of statistics could use some enhancement. The response referring to poisson processes is a better view of things.
> Your knowledge of statistics could use some enhancement.

In this we agree.

With 0 accidents you cannot accurately measure the rate, but you can estimate an upper bound on it for whatever probability of being wrong you're willing to accept.