Hacker News new | ask | show | jobs
by s1artibartfast 1044 days ago
That makes perfect sense. Where it breaks down is if you put percentages on it. If you say the car is a 3% chance of hitting you, it doesn't and you repeat the process a thousand times, and it never hits you something is wrong with your math
3 comments

I wonder if it's the difference between "this asteroid" and "all asteroids". As we learn more about it, we can start to treat it like a process that has repeated, but initially we can't be sure if it's like other asteroids.

Consider a 6-sided dice roll. What is the chance it will roll a 1?

A person might think, "1 in 6". But what if this is a loaded die? In that case, we need more information before we can classify it as "a die like other dice". We can observe two rolls, and try to ascertain whether or not it is like other fair 6-sided dice; however, two rolls is not enough to be sure.

So as we're gathering data, we start to classify this instance of a thing (a die, an asteroid) as part of a series of things we already know about. The more rolls we observe, the more sure we can be that this is a fair die or a loaded die, for instance.

If I'm understanding how asteroids' trajectories are calculated, we can simulate THIS asteroid's trajectory (3% chance of hitting you, based on a little data), or we can just decide to classify it (perhaps prematurely?) in the series "an asteroid like every other asteroid that we've observed" and arrive at a 0.000001% chance of hitting you (I'm making up a number here).

I think you're right. The 3% number must be ignoring repeat sampling bias. This is basically the same issue as P hacking or false positives and medical testing.

You have one confidence margin for a single single measurement and a different confidence margin if you make 1 million measurements.

Let's say you can measure marble diameters and your tool has a calibrated standard deviation of 1 mm.

If you pull one marble and measure it to be 10 mm larger than expected, you can calculate the chance you are wrong using only the standard deviation of your measurement tool.

However, if you pull 1 million marbles and measure one to be 10 mm larger than expected, you need to take into account the number of marbles you have measured.

The uncertainty is epistemic not aleatoric. The percentage represents our knowledge about the system at the time of measurement propagated through the forward model and is not an inherit random process in the system/model itself.
If your model is consistently wrong in a statistically predictable way, either your measurement or model is inaccurate.

A 3% chance that never occurs is an inaccurate prediction.

Right! Yes absolutely!

It's wrong because the measurements are suggestive of possibility, rather than certain of it.

If we observe an asteroid that with two poor measurements is determined to be headed away from Earth, that's the end. Look no further.

If we observe an asteroid with two poor measurements that has some significant chance of hitting, more and better measurements are made. Then very often those better measurements show it was never actually going to hit anyhow.

But we never would have known without the better measurements, and we never would have devoted more time to making better measurements without a reason to do so.

A 3% chance that never occurs is because that 3% is based on data that's at the limit of what the telescopes can provide, not based upon bad math.

Then what does 3% mean? Surely it means "given the data we have, one in every 33 will hit". Since that empirically doesn't happen, it must be that "the data we have" has a very low prior probability of being real. In other words, the measurement noise seems distributed in a way that over-represents unlikely trajectories.

Hence it seems that it would lead to more accurate predictions if the measurements and their uncertainties were fitted to a model that corrects for the prior probability of observing an asteroid on a given trajectory/making a certain observation.

This discrepancy between distribution of measurement error vs distribution of actual trajectories is what people are wondering about, because it seems interesting to know more about (e.g. "why are certain trajectories less likely?").

Despite all the people coming out of the Woodworks with weird theories, my best one is that the 3% number doesn't take into account their entire measurement process and sampling.

It's is similar to P hacking.

I don't think you understand how this works at all. You might read up on this here if you want to learn more. https://astronomy.stackexchange.com/questions/8450/how-is-th...

If you just want to argue with people, feel free. But based on how this conversation has been going it doesn't seem like you want to learn.

You're standing in a four-lane road and see a car approaching. You're looking at an angle and the lanes are poorly marked, so you can't tell which one it's in. Your observation lets you estimate the chance you need to move at 25%.

When it gets a little closer, you can tell at least which half it's on, the left or the right. Now your estimate is either 0% or 50%.

Closer still and you tell which lane it's in, so now you're sure.

again, that makes perfect sense.

What wouldn't make sense is if you repeat this 1000 times and a car is never in your lane.

That means that something is wrong about how you are modeling the road and cars.

The claim that people are confused by is (asteroids with a 3% chance of hitting get the change revised to 0% more than 97% of the time).