| > Interesting that even the most naive methods still have >50% accuracy. All methods that you'll ever see have >50% accuracy, because if you find a signal with <50% accuracy you'll just flip the sign in the sginal and call it >50% accuracy. Here's a bit relevant to this conversation: > Previous work on predicting the directionality of Bitcoin prices has shown that significant signal exists in the price of the cryptocurrency. Hegazy and Mumford (2016) compute an exponentially-smoothed Bitcoin price every eight minutes; using the first five left derivatives of this price as features in a decision-tree based algorithm, they predict thedirection of the next change in Bitcoin price with 57.11% accuracy. > Their results substantiate earlier research done by Madan, Saluja, and Zhao (2014), who found that by using the Bitcoin price sampled every 10 minutes as the primary feature for a random-forest model, they could predict the direction of the next
change in Bitcoin price with 57.4% accuracy. > An alternative model was used by Sebastian, Katabarwa, and Li (2014), who use the Bitcoin price sampled every minute as the primary feature for a forward-feed neural network. Their results suggest that this system predicts future Bitcoin price directionality with 60% accuracy. The most glaring evidence that this entire paper is garbage is the fact that zero time is spent on putting these numbers (57.11%, 57.4%, 60%) in context. What do I mean by context? For example, observations like the fact that for the same dataset if you use a daily resolution and your prediction is always "up", you'll beat those accuracies. Obviously, the reason why this discussion is absent is because it's a lot harder than just dumping a dataset into sklearn. |
Don't get me wrong, sklearn+CSV is great for learning, and great for initial experimentation or playing around. But it's just too far from the real process to be meaningful imo.