Hacker News new | ask | show | jobs
by graycat 2905 days ago
How I learned: Early in my career, I was around DC doing mostly work in applied math and computing for US national security. No joke -- constantly the work was heavily probability, statistics, and stochastic processes. I had a good ugrad math major but no courses in any of those three subjects. So I was thrown into the deep end of the pool and was constantly struggling to understand. I did pick up a good overview and a lot of intuition. But the sources varied widely, in both the topics and the quality, over stacks of books and papers, documentation of software, etc.

Lesson: At least at first, one way to learn is just to jump in at the deep end and struggle using lots of texts, references, etc.

Sad Lesson: While nearly all the famous books were good, some of the books that, e.g., from the publisher, might have seemed good were not. The guy who wrote the stuff, I hope he got tenure -- can't be any other reason.

Later I got an applied math Ph.D. and had a terrific course in analysis and probability. So, the analysis part was, right, basically Royden, Real Analysis and the first half of Rudin, Real and Complex Analysis. There was also some material from Oxtoby, Measure and Category (ice cream and cake dessert -- super fun stuff).

The probability was right from the beginning sigma algebras, etc. So, a central topic was the Radon-Nikodym theorem and conditional expectation -- gorgeous once see it. So, there was beautiful coverage of the classic limit theorems, especially martingales.

Best course of any kind I ever took in school. The prof was a star student of E. Cinlar, long at Princeton.

For the course, the main texts in probability were from J. Neveu, L. Breiman, K. Chung, M. Loeve.

For statistics, for the applied stuff, I just remember the stacks of books I worked with early on, especially multivariate statistics. For the math, I just regard that as applied probability and sometimes just do my own derivations, sometimes at least a little new. I never found a statistics book I like or can recommend as the single, main book, e.g., like Rudin in analysis or Neveu in probability. All I can suggest is just to dig into the stacks of the most famous books and also glance at some of the software documentation.

I suspect that there is a really good statistics book to be written, and maybe someone has written it, or is writing it, but I haven't seen it.

Here is a simple derivation I typed in yesterday with an intuitive result in statistics that maybe people should keep in mind. In a sense this little derivation shows the strongest possible result in statistical estimation is, and may I have the envelope please [drum roll], and the discrete data version of the winner is just cross tabulation, assuming that have enough data.

The context is a person applying for credit. Might proceed similarly for, say, ad targeting, etc.

We assume that Y is a real valued random variable where E[Y^2], that is, the expectation, of Y^2 is finite -- meager assumption, especially for practice.

The Y is something about credit worthiness, e.g., loss on a loan, we are interested in.

We assume that X is a random variable taking possibly very general values, e.g., a credit history at uncountably infinitely many points in time in the past. We assume that we have the value of X -- that's our credit data on the person.

Let's do a little preliminary derivation: What value of real number a minimizes

E[(Y - a)^2]

Well, we have

E[(Y - a)^2]

= E[Y^2 - 2 Ya + a^2]

= E[Y^2] - 2aE[Y] + a^2

= E[Y^2] + E[Y]^2 - 2aE[Y] + a^2 - E[Y]^2

= E[Y^2] + (E[Y] - a)^2 - E[Y]^2

which we minimize with a = E[Y].

Or, for one interpretation, the minimum rotational moment of inertia is for rotation about the center of mass.

So, for our main concern, suppose we want to use the data we have X to approximate Y. So, we want real valued function f with domain the possible values of X so that f(X) approximates Y.

For the most accurate approximation, we want to minimize

E[(Y - f(X))]^2

Claim: For f(X) we want

f(X) = E[Y|X]

So, f(X), using X, is the best non-linear least squares approximation to Y.

Proof:

We start by using one of the properties of conditional expectation and then continue with just simple algebra:

E[(Y - f(X))^2]

= E[ E[Y^2 - 2Yf(X) + f(X)^2|X] ]

= E[ E[Y^2|X] - 2f(X)E[Y|X]

+ f(X)^2 ]

= E[ E[Y^2|X] E[Y|X]^2 - 2f(X)E[Y|X]

+ f(X)^2 - E[Y|X]^2 ]

= E[ E[Y^2|X]

+ (E[Y|X] - f(X))^2

- E[Y|X]^2 ]

which is minimized with

f(X) = E[Y|X]

Done.

3 comments

I think if you have such an exposure to so many stat and probability books, and yet cannot recommend one good one, then clearly you are the person fated by the universe to write that one book written correctly. :)
Back in grad school, my fellow students and I were amazed at how polished were the books of Rudin, Neveu, Royden, Luenberger, Dynkin, etc. but how comparatively ..., say not good, were the books in statistics. There are some hints that there are more good statistics books now.

One of my fellow students was very capable, and I was hoping would write a good book; I doubt if he ever got around to it.

I'm glad the statistics community has at least one foot in important applications, but both feet? Way back there was Cramer. At the Brown University of Applied Math long was U. Grenander -- maybe he could have written a Cramer Volume II.

I'd like to see (A) much more polish on the foundations and then (B) selected with good insight and expertise some of the keys to some of the more important applications.

Some of the application areas where I suspect, with varying degrees of strength, there is some good work include (i) particle physics such as at the LHC, (ii) a huge range of bio-medical research, (iii) high end military radar, sonar, and tracking more generally.

When I was in grad school, some of the gossip was that statistics of sample paths of stochastic processes was a wide open field -- I suspect it still is.

Apparently in the US, for well done theory, at least in attitudes, statistics is a poor cousin of probability theory, that is a poor cousin of pure math, and stochastic processes is just out of the picture.

I haven't tried to be a statistician, but I've done some projects and gotten some results. But for each of the results, clearly there were plenty of loose ends and more to do but without any very clear theory, examples, experience, or methods to tie off the loose ends.

Maybe here's one -- maybe since I'm not putting a lot into this just now: Above I gave my little derivation that with data X, the best estimate of Y is E[Y|X] with the idea that this partly justifies cross tabulation as the discrete version. Okay, but X might be a sample path of a history of a stochastic process with lots of dimensions with goofy data types. So, maybe to cut down some on the exponential explosion of the data required for cross tabulation on several variables, exponential in the number of variables, pick and choose the variables. Okay, but first cut we have not even zip, zilch, or zero on how to do that.

Once I published a paper on multi-dimensional, distribution-free statistical hypothesis tests, intended for zero-day computer security. But, again, the number of variables with data is huge; we encounter another exponential explosion and would like some help on which variables to choose.

Very broadly, from 200,000 feet up, we get to choose the variables to use and then want to know something about the accuracy of our results -- too often we are to use the TIFO (try it and find out) method, some form of Monte Carlo, resampling techniques (B. Efron, P. Diaconis) deleting some variables or observations and trying again, etc.

My guess is that finding a welcoming department in a research university and/or an interested problem sponsor in a funding agency would be too tough.

Can you recommend what is the best sequence of books in your opinion to get to rigorous probability theory? E.g.:

- Principles of Mathematical Analysis, Rudin

- Finite-dimensional vector spaces, Halmos

- Real Analysis, Royden

- Mathematical Foundations of the Calculus of Probability, Neveu

Would that be something like this? Any alternatives?

To make the study easier and better rounded, just add some material.

Rudin's Principles is really nice, especially in retrospect once understand it, but going in as a student it can seem quite severe. It's precise but not too severe -- he just makes you go a chapter or two before it becomes clear why he is doing what he is doing.

To help, my nutshell view is that mostly he is just trying to develop the Riemann (Stieltjes) integral. His main result is, the Riemann integral exists for continuous functions on compact sets. So, then, he needs to say what a compact set is. Well, the most relevant example is just a closed interval of real numbers such as [a,b]. So, why compact? Because every continuous function on a compact set is uniformly continuous, and that lets us know that the Riemann sums converge. What is compact? Every open cover has a finite subcover, and that lets us get uniform continuity. And, in R^n, a set is compact if and only if it is closed and bounded. So, Rudin needs to talk about closed versus open -- he does that on metric spaces although really he needs it only on R^n.

So, net, he starts with metric spaces and discusses open, closed, and compact. Then he shows that in R^n, compact is the same as closed and bounded. He shows that a continuous function on a compact set is uniformly continuous. Then, presto, he shows that the Riemann (or Riemann-Stieltjes if you wish) sums converge and the Riemann integral exists.

He does some nice work on infinite sequences and series, and the main reason is that he uses those tools to show lots of limits exist, e.g., for sines, cosines, and Fourier series.

There's more of value in Principles, but IMHO I gave you a good start to make the book easier. I wish I'd had been given that outline when I was working through Principles at 1+ hour a page.

But the Lebesgue integral in Royden is the one to take fully seriously.

Make Halmos the second or third text on linear algebra. And then look at some quantum mechanics where they discuss eigenvalues, eigenvectors, Hermitian, unitary, and the spectral decomposition! Right, the Halmos book is baby Hilbert space. Then look at some applied connections, e.g.,

George E.\ Forsythe and Cleve B.\ Moler, {\it Computer Solution of Linear Algebraic Systems,\/}

Maybe spend an evening on the documentation of LINPACK.

Some weekend of great fun, take a fast pass through the Gauss, ..., Stokes theorem parts of

Tom M.\ Apostol, {\it Mathematical Analysis: A Modern Approach to Advanced Calculus,\/}

where don't take the proofs very seriously but to see how physical science and engineering look at calculus of several variables.

Neveu is a great last probability text but not a good first text. So, before Neveu, look quickly, not very seriously, at whatever, including in some introductory statistics texts.

Also, Breiman's Probability is easier to read than Neveu. So, is K. L. Chung's competitive book. And there are others.

There is some more advanced material, e.g.,

Ioannis Karatzas and Steven E.\ Shreve, {\it Brownian Motion and Stochastic Calculus, Second Edition,\/}

Good luck!

Many thanks for taking the time to write this! It's much appreciated.
Errata: The expression

E[(Y - f(X))]^2

of course, and as later in the derivation, should read

E[(Y - f(X))^2]

I'll have to advise my typist to do better in the future!!!