LSTM Neural Networks for Time Series Prediction | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	LSTM Neural Networks for Time Series Prediction (jakob-aungiers.com)
	111 points by shivinski 3468 days ago

6 comments

achompas 3468 days ago

The key with autocorrelated models is to benchmark them against a naive alternative. I really appreciate Jakob's point that the LSTM might simply be modeling one step ahead using the current data point and Gaussian noise. Such a candid assessment is important in applied work.

I suspect that, similar to other time series applications, you'll find some interesting signals from exogenous effects. I wonder how LSTMs can incorporate this exogenous information for time series analysis.

rubyfan 3468 days ago

I feel like machine learning needs more content like this. Practical application of all ML techniques with code samples but more importantly real source data, data manipulation and a firm understanding of the desired output.

highd 3468 days ago

I'm unclear - aren't you showing results for training data anyway? The network might just be compressing the trends in the training data into its function. The question should be if the same network works on another time series, right?

I.e. I can build a look up table looking at the last k data points predicting the next and you're going to need a massive training dataset to make that not work well on the train set (something like O(exp(k)*k/N) time series).

rubyfan 3468 days ago

No time series analysis doesn't work like that. You can't train on one series then predict another series outcome. When training time series, you are basically looking for a signal which is not portable to some other subject.

It's more than just compressing the training set, a human can use a training set to learn seasonality, volatility, streak durations, moving averages, etc. which are learnings that can be used to infer future movement. The LTSM is learning it's own observations to predict the next tick.

highd 3468 days ago

Timeseries can 100% work like that. If you expect your timeseries data to be coming from a similar distribution that is what you do to train your LSTM. It's not just a magic box - you have to train it to encode useful features in the gate.

Sure, you can prefer to use subsets of a single time series instead of multiple time series. The issue remains that it doesn't matter what your performance on training data is. You still need to partition your dataset into training and test data - otherwise you could just be storing a lookup table for all you know. It looks like the author has trained on the entirety of the dataset, and then is just considering that performance...

Let me put this another way. You do this with a random walk. Train on your entire timeseries - every length 50 window. Say that there's only 8 unique values at each timestep. That means that there's 8^50 possible input sequences into the neural network. A sufficiently complex neural network can fit an arbitrary function, so if you just have a couple thousand windows there's ~(8^50 / 1000) possible functions that can predict the correct output exactly - and this is on noise! And in all likelihood the neural net will learn that noise: https://arxiv.org/abs/1611.03530 Without comparing training and test results there's no way to know that neural network learned anything of value - it can get perfect accuracy on training data that's pure noise!

This stuff is really critical to get right if you're doing machine learning.

rubyfan 3467 days ago

What I was interpreting the parent comment to mean two different subjects. For example, I can't train on weather data from Paris France and then expect it to be able to predict tomorrow's weather in Portland Oregon. Am I wrong on that?

highd 3467 days ago

You may be able to do that. That's sort of a matter of preference which you'd like to do. If the two datasets share more structure than it's more advantageous to share the network. There's also a bunch of hybrid approaches, i.e. pretraining on every city and then fine-tune each independently.

claytonjy 3467 days ago

When doing train/test splits in a time series context like this, would forward chain backtesting (train on steps 1:n, predict n+x) be enough validation, or would you advocate for further sampling of the training set?

highd 3467 days ago

If you're comparing deltas (i.e. x_{n+x} - x_{n+x-1}) that might be sufficient - otherwise it's hard to tell if you're just capturing that x_{n+1} is close to x_{n}. The primary risk would be that you're putting strong structure on the datasets you're testing with, so you could be mislead. Ie what if you have:

  y = sin(t) if 0<t<100Pi

  = sin(2t) if 100Pi<t<200Pi

  = sin(3t) if 200Pi<t<300Pi

Then you could imagine that with simply backtesting the model in front of where you're training you could run into issues - each train iteration might fix a constant frequency in the network and then it looks like it works great over each iteration, but you've never learned how to determine each frequency on-the-fly. If that happens with random backtesting from the dataset the backtesting would show that only 1/3 of the test set is fitting.

The gold standard is always a well-partitioned dataset. And if you're going to hold a meeting describing your results, or deploy a product, it's really important that the results stand up to these sorts of questions.

huac 3468 days ago

There's a good comment on the article about using this kind of network to predict direction rather than return. I think that's where this would show the most promise: if you know the direction that a (liquid) asset or the market will go, then you can make money via a long/short strategy.

ge96 3468 days ago

So... can you or can you not, predict the stocks with ANN's... haha, guess I won't be quitting my job any time soon.

>A stock time series is unfortunately not a function that can be mapped.

1024core 3468 days ago

> So... can you or can you not, predict the stocks with ANN's... haha, guess I won't be quitting my job any time soon.

Not sure about "predicting" the stocks with ANNs, but how do you explain the Medallion Fund averaging about 30%/y (ballpark) without a single loss year since 1990 ?

https://www.bloomberg.com/news/articles/2016-11-21/how-renai...

argonaut 3468 days ago

Except these firms don't use LSTMs at their core. They do tons of feature engineering combined with statistical models and very simple machine learning models (think linear/logistic regression). Some firms will use more complicated black box models (which would include LSTMs), but only as one additional signal to combine with all their hand engineered features and financial models.

brobinson 3468 days ago

Or Virtu having a single losing day in a six year period?

https://www.bloomberg.com/news/articles/2015-02-20/high-freq...

I wonder what the Sharpe Ratios of the trading systems these funds run are.

nomnombunty 3468 days ago

Virtu does high frequency trading so it makes sense that they don't have many down days. Also HFT strategies can have ridiculous sharpe ratios of like 100

makeset 3467 days ago

True, although conventional metrics like Sharpe ratio or ROI are not very meaningful for HFT models, because they can't scale with any additional capital (you can safely assume they are scaled to the max). Their returns are extremely consistent, but also ultimately limited in magnitude. Rather than magical money-making machines who have cracked the "secret code" of financial markets, HFTs are essentially a fixed-cost utility service for reducing market inefficiency through improved price discovery.

brobinson 3468 days ago

>HFT strategies can have ridiculous sharpe ratios of like 100

Yeah, I've heard similar and it explains their basically linear equity curves...

imaginenore 3468 days ago

You most certainly can, just not with such naïve approach. Jane Street is a company that does it, and they are quite open about their tech approach (just not about their specific models):

https://www.youtube.com/watch?v=hKcOkWzj0_s

Jane Street went from zero to trading $1 trillion in volume in just 15 years.

hippich 3468 days ago

I would say that you certainly can not predict it. But given enough information (all stocks, news, etc) sufficiently complicated LSTM-like network might be able to beat you. But feasibility of building and computing it today - I have my serious doubts :)

devonkim 3468 days ago

To be fair, all you really need to do in most stock market use cases is to predict measurably better than the competition to be able to get something of business value, not necessarily really close to "perfect" or within so much accuracy.

ge96 3468 days ago

Yeah I don't know anything about this stuff at this point. fantasy

Anyway thanks

techbio 3468 days ago

The LSTM shown will or will not predict the direction of a random walk. But it does OK with a sin function.