Hacker News new | ask | show | jobs
by stochastic_monk 2905 days ago
The title of this page feels like a reference to the excellent Mohri et al. Foundations of Statistical Machine Learning. I’d recommend both it and Shai Shalev-Schwartz for VC theory/Rademacher complexity sorts of statistical ML.

However, I’m not crazy about this summary. It’s less foundational than it is a survey. I probably dislike the format and formatting more than anything, but I would not recommend this over other resources.

2 comments

This course is complementary to Mohri's excellent book and course. Many students at NYU take both courses, in either order (https://davidrosenberg.github.io/ml2018/ and https://cs.nyu.edu/~mohri/ml17/). Mohri's course builds a foundation for proving performance guarantees (yes, using tools such as VCdim and Rademacher complexity). This course tries to be practical, but not superficial. We do a careful study of multiple examples of the four fundamental components of an ML method: loss function, regularization, hypothesis space, and optimization method. (In a probabilistic setting, regularization becomes a prior and loss becomes a likelihood.) Framed in this way, it's usually much easier to understand or invent new methods. And within this framework, we absolutely try to survey as many methods as we have time to look at carefully.
Hi David! Thank you for your thorough response. I appreciate it. And I understand why one would choose to cover as much breadth as you do. I don’t think there’s an easy way to cover all the basics without feeling like a survey, but keeping track of core concepts across applications was what helped me understand how they were connected.
Yeah, this space feels pretty saturated already, and people seem to use the same topics / presentation / ordering over and over. What a ton of duplicate effort!

It'd be nice to see a course try to put its own spin on "machine learning" and 1) present standard topics in an unusual way and 2) include topics that intro ML students might not normally see.

Take a peek at our class [0]! We present a bunch of topics not covered in undergraduate courses (at least to our knowledge), such as proximal gradient descent, random features, etc., without referencing any probability. The course is aimed at mostly Sophomores/Juniors in any STEM field with the only prerequisite being the introductory linear algebra course EE103. [1] All of the material (minus solutions) is available online for both courses.

I'm probably highly biased, but I'd like to say this is a fairly fresh take which departs heavily from the usual CS229 (Ng's course) presentation style and order since it's meant for a completely different audience (and was, to be fair, written this past quarter, unlike 229 which was written perhaps 15 years ago).

---

[0] http://ee104.stanford.edu

[1] http://ee103.stanford.edu

I used Stanford’s materials for CS229 as a reference in two of the three graduate ML classes I’ve taken. I’ve been quite impressed, and I don’t doubt ee103 is also quite well done.
> All of the material (minus solutions) is available online for both courses.

I can only find lecture slide when looking at the course site.

Are there no readings or problems?

There are problems, but I’m afraid that we took them down since we need to rewrite a few before next year. Sorry! I’m sure poking around the site might yield some results, if you’re careful enough ;) (at least for 104, someone else is teaching 103 and I’m not sure what all they’re doing with the course, but all of the main problems are available on the book’s website).
Here are some of the things that I think are distinctive about the class (although certainly all of these are taught in some other class somewhere): discussion of approximation error, estimation error, and optimization error, rather than the more vague “bias / variance” trade off (common in more theoretical classes, such as Mohri's, but not common in practically-oriented classes); full treatment of gradient boosting, one of the most successful ML algorithms in use today (most courses stop with AdaBoost, which is rarely used for reasons discussed in the course); much more emphasis on conditional probability modeling than is typical (you give me an input, I give you a probability distribution over outcomes — useful for anomaly detection and prediction intervals, among other things), explanation (including pictures) for what happens with ridge, lasso, and elastic net in the [very common in practice] case of correlated features (I haven't seen this in another class, though I'm sure others have done it, somewhere); guided derivation (in homework) of when the penalty forms and constraint forms of regularization are equivalent, which gives you much more flexibility in how you formulate your problem (I know this is uncommon because it took me a very long time to find a reference), multiclass classification as an introduction to structure prediction (idea taken from Shalev-Shwartz and Ben-David's book). On the homework side, you’d code neural networks in a computation graph framework written from scratch in numpy; well, we're pretty relentless about implementing almost every major ML method we discuss from scratch in the homework.
Is that because they are all online? After all, how many comm 101 courses are taught every year?