| Interesting article. I do something related, and here's my take: Data mining is useful because it gives you things that are predictive that you might not have considered at first, but make sense after. This is mainly due to combinatorial explosion in the potential number of formulas. You generally have a vague idea of what might be predictive, eg cheapness vs earnings and cash flow, but there's a huge number of ways that might show up in the data, and there's a huge number of ways it might hide in the data. So for instance an old school analyst might do a ranking of price/earnings as well as cash flow, or whatever bespoke formula desired. A data mining approach could take all the fundamentals and generate formulas mixing the variables, yielding a number that seem to be effective. Out of those, you'd look at them and decide that they capture some thesis (low P/E, upward trend in earnings). Then you'd look at whether the formula is sensitive to small tweaks. For instance, if you regressed the last 6 earnings and it had phenomenal performance, but with 5 or 7 it wasn't, you probably conclude it's some sort of random result. There's funds that take the mass approach to an extreme. They have huge databases, with a genetic algorithm that generates expression trees, and a battery of stats (incl backtests) to decide what works. They end up with many thousands of strategies that are a great deal more effective than your standard one-trick pony fund. |