1. Take a population of 100 people. There is a 50% chance that a random person from that population can create a successful business (obviously 50% is a made up number)
2. All 100 attempt to start a business. 50 succeed, 50 fail.
3. The 50 that failed now have a 25% of succeeding in their future business endeavors. The 50 that succeeded apparently have the same 50% of success.
4. Now, given the same population, there is only a 37.5% chance that that a random person will succeed in their next business, which is in direct contradiction to point number 1.
I'm not entirely sure I did that right. I'm no statistician so there may be some glaring logical flaws there, but that seems correct according to my intuition.
> There is a 50% chance that a random person from that population can create a successful business... All 100 attempt to start a business. 50 succeed, 50 fail.
This is the most glaring wrong assumption that causes your and GP's confusion.
90+% of startups fail.
Note: "failure predicts failure but success does not predict success" could still be true even if business failure rates were >= 50%! But the fact that failure rates are higher than 50% is the first and simplest mistake in this line of reasoning.
I know nowhere near 50% of startups succeed (and have heard the 90% failure rate many times). However, I don't think that is relevant to the mathematics of it.
Remember that these are correlational studies! They're not directly comparing raw counts of data points, they're checking for statistical significance.
while also being the case that "there is not a statistically significant correlation between past success and future success".
Think about generating a dataset using the process you outline and then performing a statistical test for correlation on the resulting dataset.
Think about the percentages in step 2 and 3. If those get small enough, then there could be a statistically significant (failure, failure) correlation in your generated dataset and also not a statistically significant (success, success) correlation in your generated dataset.
The 90% number [0] explains how those percentages get small enough that (success, success) is not picked up by a significance test but (failure, failure) is.
You don't have to take my word for it, though. You can actually implement this process, run your favorite test for correlation, and verify that as those success probabilities get small you have the above effect.
But just because this is true doesn't mean that there will be a statistically significant success -> success correlation.
Again, the most fundamental reason that can happen is because failure rates are over 50% [0].
--
[0] I mentioned in my first comment you can get this result even with a 50% failure rate. How? Companies and founders aren't 1:1, founders drop out of the data generation process, etc. You can play with that to create similar effects even in extreme cases like failure rates dropping to 50% but it'd be a bit contrived.
- Prior failure, most likely the next venture will fail.
- Prior success, most likely the next venture will fail.
Basically, odds are that a venture will fail regardless of past performance, similar to how past lottery winning doesn't predict future lottery winning. Personally, I don't think it's necessarily true (successful founders will already have an existing audience and investors for their next product), but mathematically this could be one way it holds true.
1. Take a population of 100 people. There is a 50% chance that a random person from that population can create a successful business (obviously 50% is a made up number)
2. All 100 attempt to start a business. 50 succeed, 50 fail.
3. The 50 that failed now have a 25% of succeeding in their future business endeavors. The 50 that succeeded apparently have the same 50% of success.
4. Now, given the same population, there is only a 37.5% chance that that a random person will succeed in their next business, which is in direct contradiction to point number 1.
I'm not entirely sure I did that right. I'm no statistician so there may be some glaring logical flaws there, but that seems correct according to my intuition.
(edit: formatting)