Hacker News new | ask | show | jobs
by jwandborg 2324 days ago
> Yes it's called statistics and probability theory.

My understanding of statistics is:

- I can halve the % insanity by adding another 100% of good labels.

- If I want to reduce the insanity of labels to 1/33th of ~33% I need to add another 3200% of good labels.

- If I want to reduce the insanity to 0% I need to balance the bad labels with an infinite amount of good labels.

Is there anything I'm missing entirely except probability theory? Is probability theory the answer or is there something else?

2 comments

You don't reach 0%, that's a straw man. The goal is better than human, and the 35,000+ vehicle-related fatalities that happen in the U.S. each year.
There's a disconnect here.

People who talk about the danger of humans driving cars always seem to talk about the raw numbers, because humans drive cars a lot and the raw numbers are rather large.

But when we talk about automated driving, it's in percentages, because it's not being done on the same scale.

So to compare apples to apples, you'd have to convert the number of fatalities to an accuracy percentage. Have you considered trying? There is certainly more than one way to do it, but it would greatly contribute to the discussion if you made some attempt.

> you'd have to convert the number of fatalities to an accuracy percentage

Telsa's early results for their very limited "self-driving" technology has shown a huge reduction in accidents for any given period of time the vehicles are on the road.

That seems like it incorporates a lot of assumptions. I think it's best to slow down and realize that comparisons don't mean much if you're comparing the wrong things. The first step is to determine the first thing that you are comparing and exactly what it is. Then you can move on to the other half and determine whether it is appropriate.

Humans are much safer than people on average, when driving in conditions suitable for Autopilot.

> Humans are much safer than people on average

This makes zero sense and isn't how "average" works. For the same 1000 hours on the road, a Tesla car with Autopilot will have fewer accidents than a car driven for 1000 hours by humans. This changes as driving conditions get worse, and humans outperform Autopilot.

Deleting half a sentence and saying it doesn't make sense?

The way "average" works is that you average over something - a population or set. It is very important to be clear about what that something is and whether it's appropriate.

Why do you believe that Autopilot outperforms humans in comparable conditions? If this is based on Tesla marketing, I'm extremely prejudiced against them, and assume out of hand that they simply aren't making the right comparison and don't care. However, if you think that is incorrect, you could elaborate on why you have the opinion you do.

It's hard to reach 0% bad labels because:

1. You can't have an infinite amount of good labels 2. Humans are in charge of labeling too.

The question is if you can reliably overcome the number of bad labels in your training set, so that 33% of bad labels equates to <33% "insanity" in the system.

Your understanding is wrong for anything nonlinear. The whole reason machine learning is useful is because it is nonlinear.
How nonlinear are we talking? My understanding is probably closer to the truth than to the opposite of the truth. I'm looking for an estimate of how far from the truth I am.

How would a system reliably discredit missing labels while still learning from good labels? The simplest solution would be that system is able to spot the bad/missing labels itself with some certainty, but that seems like a catch 22.