I am under the impression that to learn statistics one must first have a working knowledge of probability theory which rests upon grad level math analysis. Can machine learning be studied without any of that?
A lot of this is done in discrete math. You know, the actual probability is defined by this integral, but there is no closed form solution to the integral, so we do sums to find the approximate answer. Anyone can understand sums. And, it's probabilities, so the sums must equal one. Not that hard, right ;)
It sure helps to understand the integral equations, especially if you want to read the original literature. But realistically you are going to need to understand summing, normalizing, algorithms for clustering, and so on. You probably don't want to write your own numerical code anyway; someone else did it, and they handled all the edge cases that a naive implementation misses.
You can find PDFs of the James, Witten, Hastie, Tibshirani book "An Introduction to Statistical Learning" [1]. Scroll on through - there is nothing intimidating math wise. All the heavy lifting is left to R.
I don't really know what you're looking for. If it's a replacement for that Coursera ML class in Python, then I don't think there really is one. The basic tenets of ML aren't going to change depending on your language, though.
Thanks a lot for this! I didn't realize probabilities would be so important but I've been working with conditional expectations (not sure if it is relevant in machine learning) but it was an eye opener.
Another great introduction are the descriptive and inferential statistics courses on Udacity!
Conditional expectations are an important part of regression and in other scenarios where you might want to adjust a parameter estimate ("for every unit increase in x we get this much of a difference in y") for confounders. Generally, in machine learning, parameter estimates are not the (exclusive) basis for prediction, instead you put data in and a prediction comes out and what's in between is somewhat of a black box.
Well, technically we do know what's in the black box of course, it's just that for many methods it's not easy to summarize because there's so much happening under the hood. Leo Breiman (who invented random forests) gives some examples of how to do it, though: https://projecteuclid.org/euclid.ss/1009213726
Like most fields, it depends on your definition of "studied." If you want to push the envelope in theoretical non-applied research, you're going to want to learn analysis & measure theoretic probability theory. If you want to apply existing techniques, read (well-written) papers and code up the algorithms you find there, you can get away with undergraduate-level linear algebra & probability knowledge - Bayes' rule, expectations, independence, the general ability to think about random variables (and matrices thereof) as values that can be transformed and combined. And of course, you can fire up a classifier in SciPy without knowing any of this at all. But that's stretching the definition of "studied" quite a bit!
I personally went into a graduate-level probabilistic machine learning course with probability knowledge consisting of an undergraduate course that followed Ross http://www.amazon.com/Introduction-Probability-Models-Tenth-... - so there's certainly no need to have been a math major. But if you've never dealt with random variables whatsoever, you'll hit a wall following research from the last 20 years.
There is applied machine learning (using machine learning to solve business problems) and theoretical machine learning (Optimization bounds, proofs, algorithm design).
With applied machine learning it is certainly possible to quickly get a working knowledge without too much reliance on statistics or difficult theory. You can compare this a bit with using a sorting function without knowing exactly how it works (but you know how fast it is and when to use it).
If you have an engineering background, take a look at the wide array of high-quality ML code and tools. Study trendy and powerful tools like XGBoost.
What do you mean, grad level math analysis? Much of probability theory can be learned with basic multivariate calculus. (Perhaps there's a terminology misunderstanding here - when I see "grad level" I think "grad school," ie masters/phd). Certainly basic probability theory is a plus.
I agree with many of the responses here, that Math. Analysis (epsilon-delta proofs, continuity, etc.) is not strictly necessary for statistics. But...it certainly will help.
The problem with dumping the measure-theoretic probability is that you won't really know what a random variable is. It has a definition (a measurable function into the reals), and without that, you will have a tendency to think of it as "a box that produces something random when you look into it". This will limit your ability to understand papers, and will make you insecure in talking to people.
Besides "random variable", other common notions will also be hard to understand without measure-theoretic probability, like "almost surely", convergence concepts, the difference between the SLLN and WLLN, etc.
The problem with dumping analysis is that you will not know some basic things like what a continuous function is. What is everywhere continuous? What is a C1 function? And again, you will have a hard time reading and speaking.
For what it's worth, I found analysis to be not that fun, but measure-theoretic probability to be really a fun, tight, theory. It was enjoyable to learn.
Measure theory being necessary to statistics is rather contentious; a better discussion is on Andrew Gelman's blog [1].
My school's PhD stats program does require real analysis before the prelims, but for most intents and purposes, 'multi' and 'linal' (as the cool kids say) should be sufficient for machine learning from a comp sci perspective.
I haven't fully worked through ESLR (Hastie and Tibsharini's advanced version of ISLR posted above) but the majority of the math there is linear algebra with some differential equations and calculus thrown in. I've heard Harvard Stat 210 and Berkeley Stat 205A/B cited as good examples of mathematical stat classes - if you're seriously interested maybe take a look at those syllabi.
Elementary (ie: math-speak for "undergrad-level") probability theory is quite accessible to someone with only a computer scientist's math classes. You really don't need real analysis until you start reading research papers on probability and they drop down into measure theory for this-and-that.
i would say you can can certainly get away with applying machine learning techniques without knowledge of probability theory, but if you want to do stuff like compare models, compare results, determine accuracy of your model, etc., you are going to quickly have to dive into basic statistics (bayes + frequentist)
It sure helps to understand the integral equations, especially if you want to read the original literature. But realistically you are going to need to understand summing, normalizing, algorithms for clustering, and so on. You probably don't want to write your own numerical code anyway; someone else did it, and they handled all the edge cases that a naive implementation misses.
You can find PDFs of the James, Witten, Hastie, Tibshirani book "An Introduction to Statistical Learning" [1]. Scroll on through - there is nothing intimidating math wise. All the heavy lifting is left to R.
Jump in, the water is fine!
[1] http://web.stanford.edu/~hastie/pub.htm