|
Several of the comments in this thread clarify what the Bloomberg machine learning work is about. Then with the clarification, maybe I see some issues with the work. We can start with just the simplest case, ordinary, plain old regression with just one independent variable. So, for some positive integer n, we have pairs of real numbers (y_i, x_i) for i = 1, 2, ..., n. Here we are trying to predict the y_i from the x_i; we are trying to build a model that will predict y from a corresponding x where x is not in the data. Our model is to be a straight line, e.g., in high school form y = ax + b Okay, doing this, with the classic assumptions, we can draw a graph of the data, the fitted line, and the confidence (prediction) interval at each real number x. For some intuition, suppose the x_i are all between 0 and 5. Maybe then the fit is quite good, the line is close to the data, and the confidence intervals are small. But, IIRC, roughly or exactly, the confidence interval curves are actually hyperbolas. So, while the upper and lower curves are close for x between 0 and 5, for x outside the interval [0,5] the curves can grow far apart. So, if we are interested in the predictions of the model for, say, x = 20, the confidence intervals may be very wide, enough for us to conclude that, even though we have a straight line that fits our data closely, still our model is useless at x >= 20. So, this little example illustrates a broad point about such curve fitting: The model might work well for independent variable x (likely a vector of several components) close to the training data but commonly be awful otherwise. How serious this situation is can vary a lot depending on the application. E.g., if are interested only in the values of y for x a lot like the training data, e.g., in the interval [0,5], then maybe don't are about the y value or the confidence interval when x = 20. But quite broadly there are applications where what we want such models to tell us is the value of y for some x not close to what we have seen in our training data. Uh, let's see: IIRC Bloomberg is selling stock market and economic data, often in nearly real time, to investors, some of whom are traders and make trades quickly, within a few seconds, based on the data from Bloomberg. I'm no up to date expert on just what the Bloomberg customers are doing, but from 20,000 feet or so up maybe the situation is something like, broadly the investors vary: (A) Some investors want a portfolio constructed much like in the work of H. Markowitz or W. Sharpe. In simple terms, they want the portfolio to have good expected return with low standard deviation of return and maybe, then, buy on margin to raise the rate of return while still having the risk relatively low. (B) Some investors are interested in relationships between stocks and options -- e.g., the Black-Scholes work is an example of this. IIRC, a more general case is some stochastic process, maybe Brownian motion, reaching a boundary and a first exit. The exit has a value, so the investment problem is a boundary value problem. IIRC can design and attempt to evaluate exotic options with such ideas. (C) Some investors are just stock pickers and buy when they sense a sudden rise in price. But a theme in (A)-(C) is that the investors are looking for something unusual. So, in the model building, the unusual may have been unusual in the training data and the testing data. In that case, without more assumptions, theories, or whatever the prediction of the model for unusual input data may be poor. That is, it appears that the model building techniques promise that the model will do poorly in just the application cases of greatest interest to investors -- the unusual cases that are not well represented in the training and test data. So, maybe first cut some of what is needed is some anomaly detection. So, we could use more information about the systems we are trying to model. A linearity assumption is one such. In Newton's second law and law of gravity, we can check that for falling apples. Next we can try on the planets in our solar system and, nicely enough, see that it works. And then we can be pretty sure for a rocket at Mach 15 headed for stationary orbit, etc. But with just empirical curve fitting, apparently mostly we don't have such additional information. IIRC L. Breiman's first interest in empirical curve fitting was for clinical medical data. So, maybe in that data he was trying to predict some disease but the independent variable data he was using was common in his training and testing data. I.e., he wasn't really looking for exploiting some anomaly for some once in 20 years way to get rich quick. |
Anomaly detection is definitely another important area, but I struggle to pull together a coherent unit on the topic. One issue is that it’s difficult to define precisely, at least partly because everybody means something a little bit different by it.
Also, based on classical hypothesis testing, I think that to some extent you have to know what you’re looking for to be able to detect it (ie to have power against the alternatives/anomalies you care about)... For that reason, I think it’s hard to separate anomaly detection from more general risk analysis/assessment, because you need to know the type of thing you care about finding.
In any case, I made an attempt on anomaly detection: There's https://bloomberg.github.io/foml/#lecture-15-citysense-proba... which is simply about building a conditional probability model, and flagging behavior as anomalous if it has low probability or prob density under the model. I also used to have 1-class SVM’s in a homework (https://davidrosenberg.github.io/mlcourse/Archive/2017/Homew... Problem 11).