| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 23iofj 1987 days ago

Remember that these are correlational studies! They're not directly comparing raw counts of data points, they're checking for statistical significance.

It can be that

    COUNT(failure -> failure) > COUNT(success -> failure)

while also being the case that "there is not a statistically significant correlation between past success and future success".

Think about generating a dataset using the process you outline and then performing a statistical test for correlation on the resulting dataset.

Think about the percentages in step 2 and 3. If those get small enough, then there could be a statistically significant (failure, failure) correlation in your generated dataset and also not a statistically significant (success, success) correlation in your generated dataset.

The 90% number [0] explains how those percentages get small enough that (success, success) is not picked up by a significance test but (failure, failure) is.

You don't have to take my word for it, though. You can actually implement this process, run your favorite test for correlation, and verify that as those success probabilities get small you have the above effect.

What you've proven above is that

    COUNT(failure -> failure) > COUNT(success -> failure)

But just because this is true doesn't mean that there will be a statistically significant success -> success correlation.

Again, the most fundamental reason that can happen is because failure rates are over 50% [0].

[0] I mentioned in my first comment you can get this result even with a 50% failure rate. How? Companies and founders aren't 1:1, founders drop out of the data generation process, etc. You can play with that to create similar effects even in extreme cases like failure rates dropping to 50% but it'd be a bit contrived.