Hacker News new | ask | show | jobs
by tempodox 4178 days ago
This reminds me how much of a hole there is in my knowledge about statistics and such. I built myself a Twitter client that sucks users' geolocations into a DB so I can do all kinds of analyses on their movements. Makes me wish we had Statistics classes back in school. That should come right after learning the ABC.
5 comments

Even the original twitter blog seems to have gone out of its way to make this seem more complex then it is...

Decomposition of time series is done with STL (stl function in stats package) and this is the first part of what they call "Seasonal Hybrid ESD (S-H-ESD)" (sounds impressive right?) which then apparently just involves taking the max absolute difference from the detrended sample mean in terms of standard deviations, remove it and repeat until you have your collection of x outliers. If they wanted to this could be explained in a few sentences, and the underlying code is really simple [0], but for whatever reason it's been written up as advanced analytics — as if decomposing a time series is a major challenge.

[0] https://github.com/twitter/AnomalyDetection/blob/master/R/de...

While the computation might be relatively simple, its still necessary to be aware of literature and use the proper academic description for the methods.
Time series analysis comes quite far into most statistics syllabi. I did quite a bit of statistics at school and it wasn't until my third year of Mathematics & Statistics at university that we touched time series data. (Although it could have been taken as a second year module I think).
There's a TED talk you may enjoy in which the speaker (I believe Arthur Benjamin) argues that statistics should be taught before calculus. There's a bit of a dependency on calculus in statistics (e.g. the proof of linear regression makes no sense without differential calculus), but I find that I use linear algebra and statistics far more in my work, daily life, and when reading about topics that interest me. There was no mention of those branches of math in my high school - I had to learn about them on my own, and I feel they're more valuable. I don't think you need to drop trigonometry and calculus to make room, though. I started my education in New Zealand and I believe by 7th form (final year of high school) students there have done both statistics and calculus, and more linear algebra than students in the US have.
I have a talk in which I argue that if you can program, you have a huge advantage in learning statistics because you can simulate random processes and tinker with them to get an intuitive sense of the statistics involved.

At Data Driven NYC: https://www.youtube.com/watch?v=AfSM45ncAT8 Keynote at Strata+Hadoop World 2014: https://www.youtube.com/watch?v=5Dnw46eC-0o

I just signed up for a statistics class at my local university. I will officially be an undergrad student again on Jan 20, and I intend to learn much much more this time.

Luckily my employer encourages learning, and it helps that the class is mostly during lunch.

Keep on learning!