Hacker News new | ask | show | jobs
by zwaps 2854 days ago
I find it interesting that Computer Scientists are basically rediscovering statistics.

Now when predicting time series, an issue is that most model (like ARIMA, GARCH etc.) are short-memory processes. When you look at the full-series prediction of LSTMs, you observe the same thing.

So in terms of Time Series, Machine Learning is currently in the mid to late 80's compared to Financial Econometrics.

So if you are a CS, you should now probably take a look at fractional GARCH models and incorporate this into the LSTM logic. If the statistic issues are the same, then this may give you that hot new paper.

5 comments

It's been amazing to watch CS (really the Python community, save statsmodels and patsy) discover statistics. For a while I thought perhaps it was me and statistics that was "behind." Over time I realized that it was mostly re-invention of old ideas: one-hot encoding = dummy variables, neural networks approximating polynomial regression, etc. I decided to double-down on statistics and it's really paid off. NN / random forests and the stats-founded but CS-led approaches are very general models. That leaves statisticians a big opening because a more specific model can be chosen to obtain more accurate predictions. These days I'm positioning myself to clean-up the messes / save broken ML models. Turns out [stats] theory is very practical. :-)
Because saying "relevant username" is frowned upon I'll just point out that R A Fisher is "a genius who almost single-handedly created the foundations for modern statistical science"[0]

0.https://en.m.wikipedia.org/wiki/Ronald_Fisher

It’s funny to me, as a professional statistician, because most methods popularized by Fischer et al in the early 1900s are wildly inappropriate for practical problems, especially policy decision science or causal inference.

All the theory behind t-testing, Wald testing, using the detivatives of the log likelihood near to the MLE point estimate in order to also estimate standard errors when no analytical solution exists, ANOVA, instrumental variables, etc.

It is in no sense exaggerative or incendiary to say that whole collection of stuff is truly garbage statistics that is insanely rife with counter-intuitive results, common situations when minor violations of the assumptions can easily lead to statistically significant results of the wrong sign, and common practical needs (like model selection without doing a bunch of pairwise or subset selection calculations, or correcting for multicollinearity in large regressions where calculating something like variance inflation factors is totally intractable) are difficult or impossible.

Modern Bayesian approaches fully and entirely subsume these techniques, and not just for large data (in fact, using Bayesian methods is more critical for small data), and also not because of modern computing frameworks, but because, from very first principle of null-hypothesis significance testing, that whole field of stats/econometrics is fundamentally incapable of giving evidence or estimations that could address the very questions that the whole field is based on.

NHST basically solves a type of inference problem that nobody can ever actually have in reality, and which is almost always not even approximately close enough to actually be non-misleading.

NHST is like the stats analogue of Javascript: a horrible historical accident that gained market traction despite being utterly and unequivocally a bad choice for the very problem domain it’s intended to be used for. The historical accident of adoption and momentum in Javascript sets back professional computer science by decades until it’s eventually wholesale replaced with something whose first principles are actually appropriate.

That same reckoning is in flux in many fields of statistics, as the fundamental unreliability of NHST estimation is more understood and drop-in Bayesian replacements are more available.

I don't disagree with anything you've written. The only thing I'd take issue with is placing NHST at the feet of statisticians. Scientists deserve a fair share as well. :-p
... and someone who would be very difficult to "out asshole" or to out do in male chauvinism.

Those are criticism on personality, on the technical side it took the community a long time to undo the damage of promoting non robust parametric statistics. But this much is certain he pulled statistics into the realms of math -- no mean feat.

That's true and it's well documented. I think E.T. Jaynes gives a poigntent reflection in his Theory of Probability.
I know a handful of Econ phds working in data science; and Google, FB etc. have hired top economists as well.

The Phineas Gage of applied quantitative Econ is demand estimation. You typically want to know the elasticity of quantities sold to price so to inform pricing policies. But the problem is that causality is cloudy -- low prices cause a decrease in supply -- so you never know what you're looking at.

People with a decent training in econometrics know how to treat this problem.

I'm pretty sure orgs like Amazon were trying to do naive demand estimation, fell flat on their noses and copped to having to hire people who have thought about the underlying conceptual issues before.

I'm curious what resources you found useful to learn stats modelling and what sorts of approaches have been useful.

On one hand, it's almost a tautoloy that specific models should be better than general models, but I worked on some 2d time series classification with a statistician and afterwards, for kicks, I replaced the entire thing with a CNN+LSTM and it worked just as well as the whole complicated model he had come up with.

I highly recommend this econometrics text for getting started with statistics: https://www.amazon.com/Principles-Econometrics-5th-Carter-Hi...

For modeling I found Wooldridge's panel and cross-section data book very useful: https://www.amazon.com/Econometric-Analysis-Cross-Section-Pa...

Greene is a really useful reference text: https://www.amazon.com/Econometric-Analysis-8th-William-Gree...

For advanced stats theory, I recommend Casella and Berger https://www.amazon.com/Statistical-Inference-George-Casella/...

Hope that helps!

The more specific a model can be made to the problem at hand, the better it'll perform. Supervised ML models are great starting / baseline models.

I second Wooldridge. Greene I found to be much denser without providing much additional insight. It is a popular MS/PhD entry text though.

I add any of Ken Train's work to this mix, especially on estimating discrete choice theory.

True.

On the other hand, the "more ignorant CS approach" has produced impressive achievements in language tasks (e.g., translation), visual tasks (e.g., image generation), game playing tasks (e.g., Go), agent-in-virtual-world tasks (e.g., DOTA), and robot-in-real-world tasks (e.g., self-driving cars).

Academic statistics departments often seem to be "20 years behind" on all those fronts...

No doubt that club of statistical significance held back many statisticians.
I don't think it's entirely fair to say "Computer Scientists are basically rediscovering statistics". LSTMs are used beyond just time series prediction. It is also quite common in language modelling tasks, which is also a sequence modelling task, and where it works quite well. I'm not familiar at all with using GARCH/ARIMA for something like this.

Also, with neural networks it's very easy and natural to build complex models where different "layers" perform different tasks. So an LSTM can very easily be extended to work bi-directionally (taking data from the beginning of the sequence, and the end of the sequence), adding things like attention, using word-vectors before the recurrent network or just using a character model.

What are the statistical equivalents for this? Because most of the papers on this topic seem to come from Computer Science. Take a look at the epilogue of [1] for a thorough discussion on where statistical theory needs to catch up.

[1] Computer Age Statistical Inference - Efron, Hastie.

> What are the statistical equivalents for this?

That would be nonparametric statistics.

> That would be nonparametric statistics.

No, it wouldn't. Firstly, nonparametrics in general can be a little misleading. The most common instantiations place function ("process") priors on modeling decisions that are otherwise found through trial and error. Those process priors do have their own parameters though. But more importantly, LSTMs and neural networks are very much parametric - their success come from the advances in computing and optimization that have enabled estimating these parameters in very complicated model structures.

Your CS term for parametric is not quite 1 to 1 with statistic usage for parametric.

Also what you're describing is very similar to Bayesian statistic.

> But more importantly, LSTMs and neural networks are very much parametric - their success come from the advances in computing and optimization that have enabled estimating these parameters in very complicated model structures.

Which for statistician is basically blackbox and nonparametric since you have no idea what the distribution is dude and there is no assumption of a distribution. Hence nonparametric statistic which is the answer to your question you've asked for.

Why are you talking about priors ? Nonparametric vs parametric is an axis completely orthogonal to Bayesian vs Frequentist.

We weren't talking about the "success" though, I was responding to the question "where in the body of stats literature would a neural net model lie".

I argue that would be non-parametric stats. In parametric stats the limit (#params/#data) goes to 0. For models where this is not the case, statisticians and probabilists call them non-parametric (and in certain cases semi-parametric models). Neural net, especially the deep kind (and certainly not the single layer kind) have the property that #params/#data is finite and large.

I agree with your sentiments, but there is a contribution that the CS departments made that the statistics, math, Econ (as in econometrics departments) seemed to have overlooked. I remember going to each of these departments in 2002 and asking them why don’t we split the data sets to train and update the coefficients and automate the process. The answer was always the same “that’s trivial and adds nothing to the field”.
> why don’t we split the data sets to train and update the coefficients and automate the process.

What you just stated is just a pipeline. You can just split the data and train it and automate with tree ensemble that aren't boosting that is if you're talking about doing in parallel.

If you're just saying split and do as batch process in different time interval you can do that with nonparametric bayesian.

CS contribution in creating Deep learning and having it be the best accurate algo for certain data domain is pretty nice. But again Stat care a lot more than prediction.

I think that ML is very useful, but remember that forecasting is really not the main objective of econometric models.

Basically, forecasting implies you have a good handle on all properties of the relevant distributions, which in my opinion is a lost cause in social sciences (think external validity).

Instead, econometrics is nowadays mainly concerned with the identification of causal effect using non-parametric or semi-parametric approaches. Basically, you can believably estimate the directionality of some mechanism, but you probably never have the data or model to make a good out of sample prediction. You can, but it's basically implied that approaches that consistently estimate some marginal of a conditional expectation will NOT be that useful to predict a whole stochastic process.

Also, using training and test sets kind of predicates that your process is very stable. Otherwise the "test" set is not really a good test, is it? Again, in social sciences these things are hard to argue. You usually wanna generalize some mechanism from this industry to that industry, not find a good predictor in the same industry. Test datasets still run on the same data!

ML is successful because in practice we DO care about prediction. This allows us to do all the cool things. Because econometrics/stats is so conservative and comes from a causal standpoint, people are just really shy to develop a model for prediction (not everywhere true, but that's the gist). For ML, the primary question is basically how good the thing predicts. When I first tried scikit learn way back, I was so confused it didn't offer standard errors or some other statistical measure. But then I saw how ingrained the in-sample, out-sample process is and I thought well - that's really useful.

tl;dr: Stats and ML have different objectives, but there is a lot to learn in stats for ML

Nassim Taleb had some negative things to say about GARCH.

"GARCH does not work out of sample. It is a good story, but I was unable to use it in predicting squared deviations or mean deviations"

I haven't found it in Rob J Hyndman's forecasting tutorial either.

How does it fare in the Makridakis competitions?

You shouldn't listen to N. Taleb on technical matters. He's been a classic mold crank for the last decade or so when it comes to anything serious, relegated instead to writing fluffy books on whatever he thinks is important.
GARCH, like I said, is a short memory process and is inherently inadequate for (longer) out of sample predictions. Doing this is possible, but not really correct. Taleb is basically right, of course what he says is probably inflammatory and half wrong, as usual.

Don't forget that most econometrics models are also concerned with identification and causality, less with prediction.

Apropos, here's a "Time series shootout: ARIMA vs. LSTM" : https://www.youtube.com/watch?v=h9QWefYBfJg&list=PL51yKFtVfM...