Hacker News new | ask | show | jobs
by msandford 1045 days ago
Imagine you see a car 1 mile away as you're preparing to cross the street. 1 sec later, it's a bit closer. You wonder "will this car hit me?". It's hard to say since the car is so far away and your measurements of its speed are so poor.

You wait 5 sec and it's still only imperceptibly closer. You realize there is no way it could possibly hit you. You cross the street unconcerned.

2 comments

That makes perfect sense. Where it breaks down is if you put percentages on it. If you say the car is a 3% chance of hitting you, it doesn't and you repeat the process a thousand times, and it never hits you something is wrong with your math
I wonder if it's the difference between "this asteroid" and "all asteroids". As we learn more about it, we can start to treat it like a process that has repeated, but initially we can't be sure if it's like other asteroids.

Consider a 6-sided dice roll. What is the chance it will roll a 1?

A person might think, "1 in 6". But what if this is a loaded die? In that case, we need more information before we can classify it as "a die like other dice". We can observe two rolls, and try to ascertain whether or not it is like other fair 6-sided dice; however, two rolls is not enough to be sure.

So as we're gathering data, we start to classify this instance of a thing (a die, an asteroid) as part of a series of things we already know about. The more rolls we observe, the more sure we can be that this is a fair die or a loaded die, for instance.

If I'm understanding how asteroids' trajectories are calculated, we can simulate THIS asteroid's trajectory (3% chance of hitting you, based on a little data), or we can just decide to classify it (perhaps prematurely?) in the series "an asteroid like every other asteroid that we've observed" and arrive at a 0.000001% chance of hitting you (I'm making up a number here).

I think you're right. The 3% number must be ignoring repeat sampling bias. This is basically the same issue as P hacking or false positives and medical testing.

You have one confidence margin for a single single measurement and a different confidence margin if you make 1 million measurements.

Let's say you can measure marble diameters and your tool has a calibrated standard deviation of 1 mm.

If you pull one marble and measure it to be 10 mm larger than expected, you can calculate the chance you are wrong using only the standard deviation of your measurement tool.

However, if you pull 1 million marbles and measure one to be 10 mm larger than expected, you need to take into account the number of marbles you have measured.

The uncertainty is epistemic not aleatoric. The percentage represents our knowledge about the system at the time of measurement propagated through the forward model and is not an inherit random process in the system/model itself.
If your model is consistently wrong in a statistically predictable way, either your measurement or model is inaccurate.

A 3% chance that never occurs is an inaccurate prediction.

Right! Yes absolutely!

It's wrong because the measurements are suggestive of possibility, rather than certain of it.

If we observe an asteroid that with two poor measurements is determined to be headed away from Earth, that's the end. Look no further.

If we observe an asteroid with two poor measurements that has some significant chance of hitting, more and better measurements are made. Then very often those better measurements show it was never actually going to hit anyhow.

But we never would have known without the better measurements, and we never would have devoted more time to making better measurements without a reason to do so.

A 3% chance that never occurs is because that 3% is based on data that's at the limit of what the telescopes can provide, not based upon bad math.

Then what does 3% mean? Surely it means "given the data we have, one in every 33 will hit". Since that empirically doesn't happen, it must be that "the data we have" has a very low prior probability of being real. In other words, the measurement noise seems distributed in a way that over-represents unlikely trajectories.

Hence it seems that it would lead to more accurate predictions if the measurements and their uncertainties were fitted to a model that corrects for the prior probability of observing an asteroid on a given trajectory/making a certain observation.

This discrepancy between distribution of measurement error vs distribution of actual trajectories is what people are wondering about, because it seems interesting to know more about (e.g. "why are certain trajectories less likely?").

Despite all the people coming out of the Woodworks with weird theories, my best one is that the 3% number doesn't take into account their entire measurement process and sampling.

It's is similar to P hacking.

You're standing in a four-lane road and see a car approaching. You're looking at an angle and the lanes are poorly marked, so you can't tell which one it's in. Your observation lets you estimate the chance you need to move at 25%.

When it gets a little closer, you can tell at least which half it's on, the left or the right. Now your estimate is either 0% or 50%.

Closer still and you tell which lane it's in, so now you're sure.

again, that makes perfect sense.

What wouldn't make sense is if you repeat this 1000 times and a car is never in your lane.

That means that something is wrong about how you are modeling the road and cars.

The claim that people are confused by is (asteroids with a 3% chance of hitting get the change revised to 0% more than 97% of the time).

3% seems much higher though. If I crossed the street at 3%, I probably would be dead by now. Cars may not be a great analogy, because they swerve, but it is quite high. Space is pretty damn big too so the odds are really low of being hit by space things. But unlike cars, space stuff tend to swerve towards the larger bodies.
> But unlike cars, space stuff tend to swerve towards the larger bodies.

That's exactly it. And at the speeds these objects are going and the uncertainty of the observations you would have to be observing an object for a really long time to get the kind of accuracy required to pick a mitigation method that would work. And even then, assuming you could nail the point of impact of something going 2000 km / second of unknown mass in a strong gravity field: given the COVID response I have a hard time believing that the response to 'Houston, Texas is going to be obliterated on Jun 1st 2024' would be met with anything but skepticism and laughter. Right up to the moment of impact.