Hacker News new | ask | show | jobs
by jrochkind1 3231 days ago
The problem is "making the data easier to analyze" may make the analysis invalid. Your response is not increasing my faith that you carefully considered what removing the outliers would do to validity of analysis.

> Not sure how narrowing down the top analysts is a flaw here.

Potentially, because how did you decide what "top analysts" were? If it's using the same methods you used to determine they were successful, it just means analysts that come out of your math come out of your math.

If a 50K people flip 10 coins, one of them might flip 10 heads. It doesn't mean that person is better at flipping heads. We could in fact calculate the chances of one of 50K people flipping ten heads. If I decided it meant that some people really were better at flipping heads, I'd probably be wrong. (Although if I calculated the chances and discovered it was like a one in bazillion chance that even one of 50K people would flip ten heads... I'd probably at least consider that they might be better at flipping heads! But I'd probably run the experiment again. :) )

If I pick the top 100 heads-flippers from my 50K coin flippers, and show that they really are better at flipping heads because they flipped more heads in the same dataset that I used to pick them as the top 100 heads-flippers in the first place --- I haven't really shown that at all. By "narrowing down top analysts", depending on how you did it, it's possible you simply found the analysts who got lucky, while ignoring the ones who didn't.

Statistical analysis is _tricky_.

3 comments

If 50,000 people each flip 10 coins, it's actually overwhelmingly likely that someone will get 10 heads. The chance that it doesn't happen is about one in a sextillion (10^21).
It's a completely different matter. As far as I know analysts are doing their analysis when pricing stocks. They may not be so good or discount all the factors, but analysing the products of some companies, their revenues, returns and other parameters seems extremely different than flipping coins to me. So your analysis has no whatsoever basis given that you are comparing a completely random outcome of some well known physical action to a chaotic system (the market) in which at least the basics influence factors on his constituents are well understood. Or are you suggesting for example that warren buffet is just being lucky for endless decades and you and everyone else know at least how to equate his performance? If is that what you think then please, I would be rather amused to see your performance as an investor compared to him in the course of several decades.
Well, see, that's the whole deal, investigating _how_ different it is than flipping coins. That's the whole question, really. Starting with the assumption that they _must_ be doing better than chance is not the right place to start in order to analyze if they are or not.

Most statistical analysis is about trying to distinguish meaningful results (implying a repeatable correlation of some kind that means something), from random chance with no meaning. The whole point is you _don't_ start out knowing if the thing you are investigating is random chance or not, if you did, you wouldn't need to analyze it. That's what statistical analysis is for. In part because we humans are really really good at finding patterns and assuming a meaningful correlation when in fact it's just random chance.

The coin example is useful because we all know (or define for the sake of the discussion) that it must be random chance, so any analysis that appeared to say it wasn't is probably in error. And using the same sort of analysis on something where you don't know how much of the effect is due to random chance--is not going to answer the question.

Funny you mention Buffet, he's about to win his bet that a set of many hedge funds fail to beat the market over a decade.
I mention him because apparently for the parent message he is only a coin thrower and he will give us only insights on the percentage of people that can get 10 heads in a row.
If someone tried to use Buffet as an example of market beating but did zero statistical analysis to determine how likely it is to be actual skill they would also deserve to be dismissed.
I probably should have included the outliers when analyzing overall performance but if I recall correctly they did not have a significant effect.

The top analysts were determined by the average performance from one year after their ratings have been made. This isn't the top analysts out of 50,000 it's the top out of 50 or so analyst-rating pairs. There were only 16 or so analysts in total that I looked at. This isn't an instance of survivor bias as your example states. If I were to be more rigorous I could give a statistical test for this.

Looking for top performers is always invoking survivor bias. It's a classic data snooping issue where the common sense approach is exactly wrong, but it'll sell a lot of books and it'll convince people who you know what you're doing as a stock analyst when it's just random luck.
Top 10 performers out of 16 or so analysts in the analysis is not survivor bias.
Sure it is, it was survivor bias when you selected the 16.