|
One very old tool for such things was called "stepwise regression". IIRC J. Tukey was partially involved in that. It appears that the AI/ML work is close to the regression and curve fitting going back strongly to the early days of computers in the 1960s and a lot in the social sciences back to the 1940s and even about 1900. A lot is known. E.g., there's the now classic Draper and Smith, Applied Regression Analysis. Software IBM Scientific Subroutine Package (SSP), SPSS (Statistical Package for the Social Sciences), SAS (Statistical Analysis System), etc. does the arithmetic for texts such as Draper and Smith. For some decades some of the best users of such applied math were the empirical macro economic model builders. E.g., once at a hearing in Congress I heard a guy, IIRC, Adams talking about that. Lesson: If are going to do curve fitting for model building, then a lot is known. Maybe what is new is working with millions of independent variables and trillions of bytes of data. But it stands to reason that there will also be problems with 1, 2, 1 dozen, 2 dozen variables and some thousands or millions of bytes of data, and people have been doing a lot of work like that for over half a century. Sometimes they did good work. If want to do model building on that more modest and common scale, my guess is that should look mostly at the old very well done work. Here is just a really short sampling of some of that old work: Stephen E. Fienberg,
The Analysis of Cross-Classified Data,
ISBN 0-262-06063-9,
MIT Press,
Cambridge, Massachusetts,
1979. Yvonne M. M. Bishop,
Stephen E. Fienberg,
Paul W. Holland,
Discrete Multivariate Analysis:
Theory and Practice,
ISBN 0-262-52040-0,
MIT Press,
Cambridge, Massachusetts,
1979. Shelby J. Haberman,
Analysis of Qualitative Data,
Volume 1,
Introductory Topics,
ISBN 0-12-312501-4,
Academic-Press,
1978. Shelby J. Haberman,
Analysis of Qualitative Data,
Volume 2,
New Developments,
ISBN 0-12-312502-2,
Academic-Press,
1979. Henry Scheffe,
Analysis of Variance,
John Wiley and Sons,
New York,
1967. C. Radhakrishna Rao,
Linear Statistical Inference and
Its Applications:
Second Edition,
ISBN 0-471-70823-2,
John Wiley and Sons,
New York,
1967. N. R. Draper and
H. Smith,
Applied Regression Analysis,
John Wiley and Sons,
New York,
1968. Leo Breiman,
Jerome H. Friedman,
Richard A. Olshen,
Charles J. Stone,
Classification and Regression Trees,
ISBN 0-534-98054-6,
Wadsworth & Brooks/Cole,
Pacific Grove, California,
1984. There is a lesson about curve fitting: There was the ancient Greek Ptolemy who took data on the motions of the planets and fitted circles and circles inside circles, etc. and supposedly, except for some use of Kelly's Variable Constant and Finkel's Fudge Factor, got good fits. The problem, his circles had next to nothing to do with planetary motion; instead, that's based on ellipses and that was from more observations, Kepler, and Newton. Lesson: Empirical curve fitting is not the only approach. Actually the more mathematical
statistics texts, e.g, the ones with theorems and proofs, say, "We KNOW that our system is linear and has just these variables and we KNOW about the statistical properties of our data, e.g., Gaussian errors, independent and identically distributed, and ALL we want to do is just get some good estimates of the coefficients with confidence intervals and t-tests and confidence intervals on predicted values. Then, can go through all that statistics and see how to do that. But notice the assumptions at the beginning: We KNOW the system is linear, etc. and are ONLY trying to estimate the coefficients that we KNOW exist. That's long been a bit distant from practice and is apparently still farther from current ML practice. Okay, ML for image processing. Okay. I am unsure about how much image processing there is to do where there is enough good data for the ML techniques to do well. Generally there is much, much more to what can be done with applied math, applied probability, and statistics than curve fitting. My view is that the real opportunities are in this much larger area and not in the recent comparatively small area of ML. E.g., my startup has some original work in applied probability. Some of that work does some things some people in statistics said could not be done. No, it's doable: But it's not in the books. What is in the books is asking too much from my data. So, the books are trying for too much, and with my data that's impossible. But I'm asking for less than is in the books, and that is possible and from my data. I can't go into details in public, but my lesson is this: There a lot in applied math and applications that is really powerful and not currently popular, canned, etc. |
http://www.sascommunity.org/mwiki/images/e/e2/NYASUG-2007-Ju...
http://www.barryquinn.com/the-statistical-dangerous-of-stepw...
Shrinkage methods like lasso/elasticnet are less susceptible to these problems.