| Quant fund insider here. The data is pretty pure, in the sense of not telling you any metadata at all. It's literally just a bunch of numbers and 0/1 labels. It's hard to implement a strategy without knowing what exactly you're looking at. I get the feeling this "pure dataset" is part of some framework that Numerai thinks will beat the market, given good predictors. That's not necessarily the case. Say I assume the 0/1 means up/down over some period. Well, being able to guess 0/1 correctly would obviously help. Say I'm right 70% of the time, then I can equal weight my bets and it will be just swell. But say I'm right about 51% of the time. Then it's going to take quite a while longer for the law of large numbers to work in my favour. Remember your ML algo will only be able to give you good predictions if some of the 21 features are actually meaningful, and we have no reason to think they are actually meaningful. Now, let's say I have some domain knowledge in finance. I want to predict over/underachievement relatively. I would be able to guess which shares go up relative to others, but not the market factor. That would require a different framework to the one I'm supposing is presented here. Is there flexibility for that? The secrecy thing makes me wonder, too. If it's just a matter of not showing your work, why don't you just have a website where people submit their daily/weekly/monthly portfolios and you keep track of the tally? |
That's actually very far from being true. If you trade a single instrument, sure, the variance will kill you in anything but the very long run. But if you trade thousands of securities (like say, the entire US equity market), then a 55% prediction ratio and a market neutral strategy will absolutely crush. Even if you blindly buy/sell on every signal without doing any sort of weighing (excluding low confidence predictions, etc), then you should see a several sigma strategy.
It only takes a very, very small edge to make a very low risk strategy if you can diversify.
https://en.wikipedia.org/wiki/Signal_averaging
Now add on top of that the fact they will have several low SNR prediction signals, and the effects of signal averaging become even greater
I'm also a "quant fund insider", as you put it...