| HN Mirror

product space. Two examples are the real line and the 3 space we live in (at least locally ignoring general relativity).

But the biggie points about the Hilbert space definition are (A) the other, somewhat astounding (vector space), examples and (B) how much can do in Hilbert space that is close to good old high school plane (2D) and solid (3D) geometry and close to good, old freshman calculus with limits, convergence, etc. So, in particular, in Hilbert space we get the Pythagorean theorem of plane geometry and perpendicular projections as shortest distance from a point to a plane in solid geometry -- two biggies for pure/applied math. We get the triangle inequality. With the inner product, we get angles and orthogonality. And the core data we need for projections are just some inner products.

The most important examples of a Hilbert space are, for positive integer n, the set of real numbers R and the set of complex numbers C, the vector spaces of linear algebra R^n and C^n -- e.g., C^n is the set of all n-tuples of real numbers. Then as in whatever first course you had that talked about vectors and dot (inner, scalar) products, with those inner products R^n and C^n are Hilbert spaces.

The part about complete is a generalization of completeness in the real numbers, that is, the biggie way the real numbers are better than just the rational numbers. In short, for example, intuitively, in the real numbers, if a sequence appears to converge, then there really is a real number there for it to converge to. Of course, that's not true in the rational numbers since, e.g., can have a sequence of rational numbers that converges to the square root of 2 but doesn't really converge in the rational numbers because square of 2 is not a rational number. This stuff about "appears to converge" is called Cauchy convergence and is a weak definition of convergence. The point about completeness is that Cauchy convergent sequences really are convergent, that is, have something to converge to and do converge to it (essentially in the sense of limits you saw in calculus or high school algebra). If are taking limits to define or approximate what really want, then also really want completeness so that what converge to exists. So, that's completeness -- for a Hilbert space, we insist on that.

Of course, need a background in linear algebra. So, for a first book, get any of the popular ones. If you wish, concentrate on the more geometrical parts and do less on the algebraic parts -- e.g., if there is a chapter on group representation theory, Galois theory, linear algebra over finite fields, where things go wrong when the field is the rationals, or algebraic coding theory, then feel free to leave that material for later. Likely should pay attention to dual spaces, but if wish can go light on adjoint transformations since they are less interesting when have an inner product. Curiously, can go light on change of basis for linear transformations, that is, the difference between vectors and coordinate vectors. Concentrate on dimension, linear independence, linear transformations, maybe touch on quotient spaces, use Gauss elimination as a good example of such things, eigenvalues and eigenvectors, orthogonality, and the polar decomposition (the core of factor analysis, singular value decomposition, matrix condition number, and more). If there is a little on the associated numerical analysis, then go ahead -- e.g., learn to accumulate inner products in double precision -- sure, take 10 minutes to see how to add iterative improvement to Gauss elimination and matrix inversion. For pseudo-inverses, cute material, but, especially for your question, likely won't see it again and can skip it.

Then if have some time, take a fast pass through the classic, Halmos, Finite Dimensional Vector Spaces. It was written in 1942 when Halmos had just gotten his Ph.D. under J. Doob (e.g., as in Stochastic Processes and, more recently, Classical Potential Theory and Its Probabilistic Counterpart) and showed up at the Princeton Institute of Advanced Study and asked to be an assistant to John von Neumann, likely the inventor of Hilbert space. Well, Halmos wrote his book to be a finite dimensional version of linear algebra as if it were Hilbert space, which is commonly infinite dimensional. So you get a gentle introduction to Hilbert space. You get good at eigenvalues and eigenvectors, the polar decomposition, orthogonality, transformations that preserve distances and angles, etc. You get a lot of geometric intuition.

BTW, at one time, Harvard's Math 55 used Halmos, Baby Rudin (below), and a book by Spivak as the three main references.

Get a start on probability and statistics. A college junior level course should be sufficient. Don't take the course too seriously since will redo all the good parts from a much better foundation soon! Note: Elementary stat courses common get all wound up about probability distributions. Well, they do exist, are at the core of probability theory, and are very important, both in theory and conceptually, but, actually, in practice usually they require more data than you will likely have, especially in dimensions above 1. So, in practice, mostly can't actually see the actual data of the distributions you are working with! You should hear about the uniform, exponential, chi-squared, and Gaussian and go light on the rest. The Gaussian is profound and won't go away even in practice although is less important in practice than long assumed in, say, educational statistics.

Then take a good pass through at least the first parts of Rudin, Principles of Mathematical Analysis, AKA Baby Rudin. For the exterior algebra in the back of the more recent editions, well, likely get that elsewhere, say, now in English, directly from Cartan. The first parts of Baby Rudin cover metric spaces well enough. So, in Baby Rudin, get good at working with the limits, completeness property, compactness, etc. of mathematical analysis, that is, not algebra, geometry, topology, or foundations, although the metric space material is the same as part of the part of topology called point set topology.

Then learn more in mathematical analysis, in particular, measure theory. Measure theory essentially replaces the Riemann integral you learned in calculus and Baby Rudin. 'Bout time! Net, measure theory is a slightly different way to use limits to define the integral (areas, volumes, etc.) -- the first, biggie difference is that do the partitioning on the Y axis instead of on the X axis. The biggie reason: The resulting integral easily handles the pathological cases, especially involving limits, that the Riemann integral struggles with. Don't worry: The integral of, say, x^2 over [0,1] is still the same, IIRC, 1/3rd, right? But consider the function f: [0,1] --> R where f(x) = 1 if x is rational and 0 otherwise. Then the Riemann integral of f over [0,1] does not exist, but the measure theory integral does and gives 0 for the result.

There are at least two now classic books, Royden, Real Analysis and Rudin, Real and Complex Analysis. But there are more, and likely more can be written. For Rudin, can f'get about the last half on functions of a complex variable.

Royden is easier to read. Rudin has the math more succinctly presented. Some people believe that Rudin is a bit too severe for a first version; but if get used to how Rudin writes, he's really good.

There in Rudin get good introductions to Banach space (a complete, normed linear space, that is, assume a little less than for a Hilbert space) with a few really surprising theorems, Hilbert space with an isomorphic argument that they are really all the same, a really nice chapter on Fourier theory (Baby Rudin does Fourier series; R&CA does the Fourier integral), and some nice applications.

With that background in measure theory, then take your good pass through probability. So, probability becomes a measure as in measure theory except the values are always real and in [0.1]. A random variable finally gets a solid, mathematical definition -- it's just a measurable function (measurable is very general; in practice and even in nearly all of theory, essentially every function is measurable; in the usual contexts, it takes some cleverness to think of a function that is not measurable). And expectation is just the measure theory integral (with meager assumptions).

And in probability get some assumptions don't see in measure theory -- independence and conditional independence, and these two yield wonders nothing like in just measure theory.

Go somewhere; look at something; get a number; then that's the value of a random variable. In practice, suppose there are 20 random variables, you have numerical values for 19 of them, and you can argue that those 19 are not independent of the 20th and that you have some data on how they are dependent, then you have a shot at estimating the 20th. Presto, bingo, get a rich as James Simons, do machine learning, get big houses, Cadillacs, Ferraris, a yacht, a private jet -- maybe!

Also get to use the Radon-Nikodym theorem (proved in both Royden and Rudin, and in Rudin with von Neumann's cute proof) for the grown up version of conditioning (i.e., Bayesian) and, from there, in stochastic processes, Markov processes and martingales (astounding results).

Books include L. Breiman, Probability, K. Chung, A Course in Probability Theory, J. Neveu, Mathematical Foundations of the Calculus of Probability. And there are more. IIRC Breiman and Neveu were both students of M. Loeve at Berkeley -- sure, can also get Loeve's Probability in two volumes. Of these, Neveu is my favorite; it's elegant; but for most readers it is too succinct.

Hilbert space again? It turns out, the set of all real valued random variables X such that E[X^2] is finite is a Hilbert space. Yes, completeness holds; with some thought, that result seems astounding, like there is no way it could be true, but it is.

Now just derive grown up versions of most of the main results of elementary statistics for yourself from what you have learned. E.g., for the Neyman-Pearson result on most powerful hypothesis testing, just use the Hahn decomposition from the Radon-Nikodym theorem.

And, with the Radon-Nikodym theorem, get a grown up version of sufficient statistics, right, based on a classic paper by Halmos and Savage.

Along the way will notice that, the last time I looked, Baby Rudin defined the Riemann integral on closed intervals of finite length, but right away probability and statistics want to integrate on the whole real line, the whole plane, etc., do change of variable manipulations with such integrals, etc. Well, for the prerequisites, those are in measure theory where the first version of its integral applies also on the whole real line, the whole plane, and much more. Measure theory also give you the clean, powerful versions of differentiation under the integral sign and interchange or order of integration.

Ah, why bother to teach the Riemann integral at all? :-)