| Alternative take: there isn't that much low hanging fruit there. Hear me out. "To the person who only has a hammer, everything looks like a nail." The data in front of your is the data you want to analyze, but it doesn't follow that that is the data you ought to analyze. I predict that most of the data you look at will result in nothing. The null hypothesis will not be rejected in the vast majority of cases. I think we -- machine learning learners -- have a fantasy that the signal is lurking and if we just employ that one very clever technique it will emerge. Sure random forests failed, and neural nets failed and the SVR failed but if I reduce the step size, plug the output of the SVR into the net and change the kernel... Let me put an example: suppose you want to analyze the movement of the stock market using the movement of the stars. Adding more information on the stars, and more techniques may feel like you're making progress but it isn't. Conversely, even a simple piece of simple information that requires minimal analysis (this companies sales are way up and no one else but you know it) would be very useful in making that prediction. The first data set is rich, but simply doesn't have the required signal. The second is simple, but has the required signal. The data that is widely available is unlikely to have unextracted signal left in it. |