Hacker News new | ask | show | jobs
by davidsrosenberg 2904 days ago
Here are some of the things that I think are distinctive about the class (although certainly all of these are taught in some other class somewhere): discussion of approximation error, estimation error, and optimization error, rather than the more vague “bias / variance” trade off (common in more theoretical classes, such as Mohri's, but not common in practically-oriented classes); full treatment of gradient boosting, one of the most successful ML algorithms in use today (most courses stop with AdaBoost, which is rarely used for reasons discussed in the course); much more emphasis on conditional probability modeling than is typical (you give me an input, I give you a probability distribution over outcomes — useful for anomaly detection and prediction intervals, among other things), explanation (including pictures) for what happens with ridge, lasso, and elastic net in the [very common in practice] case of correlated features (I haven't seen this in another class, though I'm sure others have done it, somewhere); guided derivation (in homework) of when the penalty forms and constraint forms of regularization are equivalent, which gives you much more flexibility in how you formulate your problem (I know this is uncommon because it took me a very long time to find a reference), multiclass classification as an introduction to structure prediction (idea taken from Shalev-Shwartz and Ben-David's book). On the homework side, you’d code neural networks in a computation graph framework written from scratch in numpy; well, we're pretty relentless about implementing almost every major ML method we discuss from scratch in the homework.