Hacker News new | ask | show | jobs
by laichzeit0 2853 days ago
I don't think it's entirely fair to say "Computer Scientists are basically rediscovering statistics". LSTMs are used beyond just time series prediction. It is also quite common in language modelling tasks, which is also a sequence modelling task, and where it works quite well. I'm not familiar at all with using GARCH/ARIMA for something like this.

Also, with neural networks it's very easy and natural to build complex models where different "layers" perform different tasks. So an LSTM can very easily be extended to work bi-directionally (taking data from the beginning of the sequence, and the end of the sequence), adding things like attention, using word-vectors before the recurrent network or just using a character model.

What are the statistical equivalents for this? Because most of the papers on this topic seem to come from Computer Science. Take a look at the epilogue of [1] for a thorough discussion on where statistical theory needs to catch up.

[1] Computer Age Statistical Inference - Efron, Hastie.

1 comments

> What are the statistical equivalents for this?

That would be nonparametric statistics.

> That would be nonparametric statistics.

No, it wouldn't. Firstly, nonparametrics in general can be a little misleading. The most common instantiations place function ("process") priors on modeling decisions that are otherwise found through trial and error. Those process priors do have their own parameters though. But more importantly, LSTMs and neural networks are very much parametric - their success come from the advances in computing and optimization that have enabled estimating these parameters in very complicated model structures.

Your CS term for parametric is not quite 1 to 1 with statistic usage for parametric.

Also what you're describing is very similar to Bayesian statistic.

> But more importantly, LSTMs and neural networks are very much parametric - their success come from the advances in computing and optimization that have enabled estimating these parameters in very complicated model structures.

Which for statistician is basically blackbox and nonparametric since you have no idea what the distribution is dude and there is no assumption of a distribution. Hence nonparametric statistic which is the answer to your question you've asked for.

Why are you talking about priors ? Nonparametric vs parametric is an axis completely orthogonal to Bayesian vs Frequentist.

We weren't talking about the "success" though, I was responding to the question "where in the body of stats literature would a neural net model lie".

I argue that would be non-parametric stats. In parametric stats the limit (#params/#data) goes to 0. For models where this is not the case, statisticians and probabilists call them non-parametric (and in certain cases semi-parametric models). Neural net, especially the deep kind (and certainly not the single layer kind) have the property that #params/#data is finite and large.