Hacker News new | ask | show | jobs
by canadaduane 1044 days ago
I wonder if it's the difference between "this asteroid" and "all asteroids". As we learn more about it, we can start to treat it like a process that has repeated, but initially we can't be sure if it's like other asteroids.

Consider a 6-sided dice roll. What is the chance it will roll a 1?

A person might think, "1 in 6". But what if this is a loaded die? In that case, we need more information before we can classify it as "a die like other dice". We can observe two rolls, and try to ascertain whether or not it is like other fair 6-sided dice; however, two rolls is not enough to be sure.

So as we're gathering data, we start to classify this instance of a thing (a die, an asteroid) as part of a series of things we already know about. The more rolls we observe, the more sure we can be that this is a fair die or a loaded die, for instance.

If I'm understanding how asteroids' trajectories are calculated, we can simulate THIS asteroid's trajectory (3% chance of hitting you, based on a little data), or we can just decide to classify it (perhaps prematurely?) in the series "an asteroid like every other asteroid that we've observed" and arrive at a 0.000001% chance of hitting you (I'm making up a number here).

1 comments

I think you're right. The 3% number must be ignoring repeat sampling bias. This is basically the same issue as P hacking or false positives and medical testing.

You have one confidence margin for a single single measurement and a different confidence margin if you make 1 million measurements.

Let's say you can measure marble diameters and your tool has a calibrated standard deviation of 1 mm.

If you pull one marble and measure it to be 10 mm larger than expected, you can calculate the chance you are wrong using only the standard deviation of your measurement tool.

However, if you pull 1 million marbles and measure one to be 10 mm larger than expected, you need to take into account the number of marbles you have measured.