| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fwdpropaganda 2935 days ago

> Interesting that even the most naive methods still have >50% accuracy.

All methods that you'll ever see have >50% accuracy, because if you find a signal with <50% accuracy you'll just flip the sign in the sginal and call it >50% accuracy.

Here's a bit relevant to this conversation:

> Previous work on predicting the directionality of Bitcoin prices has shown that significant signal exists in the price of the cryptocurrency. Hegazy and Mumford (2016) compute an exponentially-smoothed Bitcoin price every eight minutes; using the first five left derivatives of this price as features in a decision-tree based algorithm, they predict thedirection of the next change in Bitcoin price with 57.11% accuracy.

> Their results substantiate earlier research done by Madan, Saluja, and Zhao (2014), who found that by using the Bitcoin price sampled every 10 minutes as the primary feature for a random-forest model, they could predict the direction of the next change in Bitcoin price with 57.4% accuracy.

> An alternative model was used by Sebastian, Katabarwa, and Li (2014), who use the Bitcoin price sampled every minute as the primary feature for a forward-feed neural network. Their results suggest that this system predicts future Bitcoin price directionality with 60% accuracy.

The most glaring evidence that this entire paper is garbage is the fact that zero time is spent on putting these numbers (57.11%, 57.4%, 60%) in context. What do I mean by context? For example, observations like the fact that for the same dataset if you use a daily resolution and your prediction is always "up", you'll beat those accuracies. Obviously, the reason why this discussion is absent is because it's a lot harder than just dumping a dataset into sklearn.

2 comments

natalyarostova 2934 days ago

In order to meaningfully test this stuff you have to recreate a simulation as close as possible to the real trading environment -- and even then -- this is extremely hard to do. The lag, downtime, transaction fees, failed trades, API changes, etc, all throw a huge huge wrench in this theoretical sklearn+CSV 'prediction' game.

Don't get me wrong, sklearn+CSV is great for learning, and great for initial experimentation or playing around. But it's just too far from the real process to be meaningful imo.

link

anjc 2935 days ago

I presume that the context of those accuracy numbers is prediction for the next period (i.e. 8 minutes, 10 minutes, 1 minute). To me 60% sounds good, but apparently not to people in this thread :)

link

fwdpropaganda 2935 days ago

> I presume that the context of those accuracy numbers is prediction for the next period (i.e. 8 minutes, 10 minutes, 1 minute).

That's not a context. That's the statistic they're calculating.

> To me 60% sounds good, but apparently not to people in this thread :)

What does "good" mean, and compared to what? If you can't put a number on "good" you're flying blind.

But hey, I'm just trying to warn people that the maths are garbage here; but if that looks good enough for you go ahead and trade it.

link

mirekrusin 2935 days ago

Random generator has 50% prediction accuracy. For series like ether where price globally/on average was rising, constant predictor of “will rise” would have similar accuracy.

link