Hacker News new | ask | show | jobs
by fwdpropaganda 2935 days ago
> Interesting that even the most naive methods still have >50% accuracy.

All methods that you'll ever see have >50% accuracy, because if you find a signal with <50% accuracy you'll just flip the sign in the sginal and call it >50% accuracy.

Here's a bit relevant to this conversation:

> Previous work on predicting the directionality of Bitcoin prices has shown that significant signal exists in the price of the cryptocurrency. Hegazy and Mumford (2016) compute an exponentially-smoothed Bitcoin price every eight minutes; using the first five left derivatives of this price as features in a decision-tree based algorithm, they predict thedirection of the next change in Bitcoin price with 57.11% accuracy.

> Their results substantiate earlier research done by Madan, Saluja, and Zhao (2014), who found that by using the Bitcoin price sampled every 10 minutes as the primary feature for a random-forest model, they could predict the direction of the next change in Bitcoin price with 57.4% accuracy.

> An alternative model was used by Sebastian, Katabarwa, and Li (2014), who use the Bitcoin price sampled every minute as the primary feature for a forward-feed neural network. Their results suggest that this system predicts future Bitcoin price directionality with 60% accuracy.

The most glaring evidence that this entire paper is garbage is the fact that zero time is spent on putting these numbers (57.11%, 57.4%, 60%) in context. What do I mean by context? For example, observations like the fact that for the same dataset if you use a daily resolution and your prediction is always "up", you'll beat those accuracies. Obviously, the reason why this discussion is absent is because it's a lot harder than just dumping a dataset into sklearn.

2 comments

In order to meaningfully test this stuff you have to recreate a simulation as close as possible to the real trading environment -- and even then -- this is extremely hard to do. The lag, downtime, transaction fees, failed trades, API changes, etc, all throw a huge huge wrench in this theoretical sklearn+CSV 'prediction' game.

Don't get me wrong, sklearn+CSV is great for learning, and great for initial experimentation or playing around. But it's just too far from the real process to be meaningful imo.

I presume that the context of those accuracy numbers is prediction for the next period (i.e. 8 minutes, 10 minutes, 1 minute). To me 60% sounds good, but apparently not to people in this thread :)
> I presume that the context of those accuracy numbers is prediction for the next period (i.e. 8 minutes, 10 minutes, 1 minute).

That's not a context. That's the statistic they're calculating.

> To me 60% sounds good, but apparently not to people in this thread :)

What does "good" mean, and compared to what? If you can't put a number on "good" you're flying blind.

But hey, I'm just trying to warn people that the maths are garbage here; but if that looks good enough for you go ahead and trade it.

Random generator has 50% prediction accuracy. For series like ether where price globally/on average was rising, constant predictor of “will rise” would have similar accuracy.