|
I looked over the whole list of topics. They have a page of questions about prerequisites, and for all the questions I was able to answer that I was fully comfortable with the question. But the questions have to do with probability and optimization, and my applied math Ph.D. was in those fields with research in stochastic optimal control. I can begin to see some of why Bloomberg is interested in the material: Maybe with all the economic and stock market data they collect, they can build some surprisingly useful predictive machine learning models. Okay. Maybe one good result will be that Michael Bloomberg will give some more money to Johns Hopkins! But otherwise I was disappointed: (1) Put simply and bluntly, it's all about just one now very old topic -- regression analysis. Well, when old regression analysis doesn't fit very well, then try logistic regression, ridge regression, regression trees, other forms of regression, other forms of curve fitting, e.g., neural networks. Or, look, guys, it's all just empirical curve fitting. Ptolemy tried empirical curve fitting. He used his epicycles. They didn't fit very well. Then, later, from work of Kepler, a falling apple, etc., Newton guessed that (A) there was a law of motion due to force equals mass time acceleration and (B) there was a force directly proportional to the product of two masses and inversely proportional the the square of the distance between them. That fit the data great! Lesson: Lots of things change continuously. Usually the changes are differentiable, that is, have a well defined tangent. Well, that tangent is linear. So at least locally, lots of things are linear. So, for lots of things, a promising first cut approach is to do a linear fit. For more, neural networks can approximate anything continuous, and lots of things are continuous. So, net, curve fitting has some promise and utility. But, still, curve fitting is just guessing without any real basis in science, e.g., couldn't find or replace what Newton did. Really, the now classic texts in regression analysis start with something the machine learning curve fitting does not: The classic texts assume that there really is a linear equation, that we would have the equation exactly except for some errors in the data, and then do some nice applied math to show how to get the errors down and get a good approximation to the equation that has already been assumed to exist. The assumptions can vary, stronger or weaker, but in the theory there was little or no role for just empirical fitting. Well, machine learning is charging ahead without that assumption that a linear equation exists. And, wonder of wonders, apparently often now such an equation, even as a good approximation, doesn't exist. So, we are back to empirical curve fitting, struggling like Ptolemy. We already know: Some successes are possible, but like Ptolemy we face some severe limitations. (2) Okay, when simple regression doesn't fit very well, we keep trying? Okay. Say, we try logistic regression, ridge regression, L1 or L2 regularization regression, regression trees, boosting, ..., neural networks, etc. Uh, which is better, L1 regularization or L2 regularization? Uh, this sounds like throwing stuff against the wall until something appears to stick. Sure, that can work at times, but are we really satisfied with that? Wouldn't we want some more solid reasons for the tool we pick? Lots of places elsewhere in applied math, applied probability, and applied statistics we do have solid reasons. With so many efforts to patch up regression, maybe we might suspect that for a lot of problems regression is not the right tool? (3) There is a lot more to applied statistics, applied probability, etc. and possibly of value for applications, maybe including Bloomberg's customers, than empirical curve fitting. Commonly this work has equal justification to be called machine learning because it also takes in data, estimates some parameters, builds a model, and gives results. Moreover, the better cases work with clear assumptions so that when the assumptions hold we know we have something solid, and some of the methods use meager assumptions that are relatively realistic in practice. The Bloomberg course is just empirical curve fitting, nearly all versions of regression analysis, and omits all the rest. On the applied math shelves of the research libraries, regression is only a tiny fraction of the whole. Where's the rest? |
> (2) Okay, when simple regression doesn't fit very well, we keep trying? Okay. Say, we try logistic regression, ridge regression, L1 or L2 regularization regression, regression trees, boosting, ..., neural networks, etc. Uh, which is better, L1 regularization or L2 regularization? Uh, this sounds like throwing stuff against the wall until something appears to stick. Sure, that can work at times, but are we really satisfied with that? Wouldn't we want some more solid reasons for the tool we pick? Lots of places elsewhere in applied math, applied probability, and applied statistics we do have solid reasons.
I'm wondering what these "solid reasons" could be? Some sort of experience based on past data? An example would be helpful.