|
product space. Two examples are the real
line and the 3 space we live in (at least
locally ignoring general relativity). But the biggie points about the Hilbert
space definition are (A) the other,
somewhat astounding (vector space),
examples and (B) how much can do in
Hilbert space that is close to good old
high school plane (2D) and solid (3D)
geometry and close to good, old freshman
calculus with limits, convergence, etc.
So, in particular, in Hilbert space we get
the Pythagorean theorem of plane geometry
and perpendicular projections as shortest
distance from a point to a plane in solid
geometry -- two biggies for pure/applied
math. We get the triangle inequality.
With the inner product, we get angles and
orthogonality. And the core data we need
for projections are just some inner
products. The most important examples of a Hilbert
space are, for positive integer n, the set
of real numbers R and the set of complex
numbers C, the vector spaces of linear
algebra R^n and C^n -- e.g., C^n is the
set of all n-tuples of real numbers. Then
as in whatever first course you had that
talked about vectors and dot (inner,
scalar) products, with those inner
products R^n and C^n are Hilbert spaces. The part about complete is a
generalization of completeness in the real
numbers, that is, the biggie way the real
numbers are better than just the rational
numbers. In short, for example,
intuitively, in the real numbers, if a
sequence appears to converge, then there
really is a real number there for it to
converge to. Of course, that's not true
in the rational numbers since, e.g., can
have a sequence of rational numbers that
converges to the square root of 2 but
doesn't really converge in the rational
numbers because square of 2 is not a
rational number. This stuff about
"appears to converge" is called Cauchy
convergence and is a weak definition of
convergence. The point about completeness
is that Cauchy convergent sequences really
are convergent, that is, have something to
converge to and do converge to it
(essentially in the sense of limits you
saw in calculus or high school algebra).
If are taking limits to define or
approximate what really want, then also
really want completeness so that what
converge to exists. So, that's
completeness -- for a Hilbert space, we
insist on that. Of course, need a background in linear
algebra. So, for a first book, get any of
the popular ones. If you wish,
concentrate on the more geometrical
parts and do less on the algebraic parts
-- e.g., if there is a chapter on group
representation theory, Galois theory,
linear algebra over finite fields, where
things go wrong when the field is the
rationals, or algebraic coding theory,
then feel free to leave that material for
later. Likely should pay attention to
dual spaces, but if wish can go light on
adjoint transformations since they are
less interesting when have an inner
product. Curiously, can go light on
change of basis for linear
transformations, that is, the difference
between vectors and coordinate vectors.
Concentrate on dimension, linear
independence, linear transformations,
maybe touch on quotient spaces, use Gauss
elimination as a good example of such
things, eigenvalues and eigenvectors,
orthogonality, and the polar decomposition
(the core of factor analysis, singular
value decomposition, matrix condition
number, and more). If there is a little
on the associated numerical analysis, then
go ahead -- e.g., learn to accumulate
inner products in double precision --
sure, take 10 minutes to see how to add
iterative improvement to Gauss elimination
and matrix inversion. For
pseudo-inverses, cute material, but,
especially for your question, likely won't
see it again and can skip it. Then if have some time, take a fast pass
through the classic, Halmos, Finite
Dimensional Vector Spaces. It was
written in 1942 when Halmos had just
gotten his Ph.D. under J. Doob (e.g., as
in Stochastic Processes and, more
recently, Classical Potential Theory and
Its Probabilistic Counterpart) and showed
up at the Princeton Institute of Advanced
Study and asked to be an assistant to John
von Neumann, likely the inventor of
Hilbert space. Well, Halmos wrote his
book to be a finite dimensional version of
linear algebra as if it were Hilbert
space, which is commonly infinite
dimensional. So you get a gentle
introduction to Hilbert space. You get
good at eigenvalues and eigenvectors, the
polar decomposition, orthogonality,
transformations that preserve distances
and angles, etc. You get a lot of
geometric intuition. BTW, at one time, Harvard's Math 55 used
Halmos, Baby Rudin (below), and a book by
Spivak as the three main references. Get a start on probability and statistics.
A college junior level course should be
sufficient. Don't take the course too
seriously since will redo all the good
parts from a much better foundation soon!
Note: Elementary stat courses common get
all wound up about probability
distributions. Well, they do exist, are
at the core of probability theory, and are
very important, both in theory and
conceptually, but, actually, in practice
usually they require more data than you
will likely have, especially in dimensions
above 1. So, in practice, mostly can't
actually see the actual data of the
distributions you are working with! You
should hear about the uniform,
exponential, chi-squared, and Gaussian and
go light on the rest. The Gaussian is
profound and won't go away even in
practice although is less important in
practice than long assumed in, say,
educational statistics. Then take a good pass through at least the
first parts of Rudin, Principles of
Mathematical Analysis, AKA Baby Rudin.
For the exterior algebra in the back of
the more recent editions, well, likely get
that elsewhere, say, now in English,
directly from Cartan. The first parts of
Baby Rudin cover metric spaces well
enough. So, in Baby Rudin, get good at
working with the limits, completeness
property, compactness, etc. of
mathematical analysis, that is, not
algebra, geometry, topology, or
foundations, although the metric space
material is the same as part of the part
of topology called point set topology. Then learn more in mathematical analysis,
in particular, measure theory. Measure
theory essentially replaces the Riemann
integral you learned in calculus and Baby
Rudin. 'Bout time! Net, measure theory
is a slightly different way to use limits
to define the integral (areas, volumes,
etc.) -- the first, biggie difference is
that do the partitioning on the Y axis
instead of on the X axis. The biggie
reason: The resulting integral easily
handles the pathological cases, especially
involving limits, that the Riemann
integral struggles with. Don't worry:
The integral of, say, x^2 over [0,1] is
still the same, IIRC, 1/3rd, right? But
consider the function f: [0,1] --> R
where f(x) = 1 if x is rational and 0
otherwise. Then the Riemann integral of f
over [0,1] does not exist, but the measure
theory integral does and gives 0 for the
result. There are at least two now classic books,
Royden, Real Analysis and Rudin, Real
and Complex Analysis. But there are
more, and likely more can be written. For
Rudin, can f'get about the last half on
functions of a complex variable. Royden is easier to read. Rudin has the
math more succinctly presented. Some
people believe that Rudin is a bit too
severe for a first version; but if get
used to how Rudin writes, he's really
good. There in Rudin get good introductions to
Banach space (a complete, normed linear
space, that is, assume a little less than
for a Hilbert space) with a few really
surprising theorems, Hilbert space with an
isomorphic argument that they are really
all the same, a really nice chapter on
Fourier theory (Baby Rudin does Fourier
series; R&CA does the Fourier integral),
and some nice applications. With that background in measure theory,
then take your good pass through
probability. So, probability becomes a
measure as in measure theory except the
values are always real and in [0.1]. A
random variable finally gets a solid,
mathematical definition -- it's just a
measurable function (measurable is very
general; in practice and even in nearly
all of theory, essentially every function
is measurable; in the usual contexts, it
takes some cleverness to think of a
function that is not measurable). And
expectation is just the measure theory
integral (with meager assumptions). And in probability get some assumptions
don't see in measure theory --
independence and conditional independence,
and these two yield wonders nothing like
in just measure theory. Go somewhere; look at something; get a
number; then that's the value of a random
variable. In practice, suppose there are
20 random variables, you have numerical
values for 19 of them, and you can argue
that those 19 are not independent of the
20th and that you have some data on how
they are dependent, then you have a shot
at estimating the 20th. Presto, bingo,
get a rich as James Simons, do machine
learning, get big houses, Cadillacs,
Ferraris, a yacht, a private jet -- maybe! Also get to use the Radon-Nikodym theorem
(proved in both Royden and Rudin, and in
Rudin with von Neumann's cute proof) for
the grown up version of conditioning
(i.e., Bayesian) and, from there, in
stochastic processes, Markov processes and
martingales (astounding results). Books include L. Breiman, Probability,
K. Chung, A Course in Probability
Theory, J. Neveu, Mathematical
Foundations of the Calculus of
Probability. And there are more. IIRC
Breiman and Neveu were both students of M.
Loeve at Berkeley -- sure, can also get
Loeve's Probability in two volumes. Of
these, Neveu is my favorite; it's elegant;
but for most readers it is too succinct. Hilbert space again? It turns out, the
set of all real valued random variables X
such that E[X^2] is finite is a Hilbert
space. Yes, completeness holds; with some
thought, that result seems astounding,
like there is no way it could be true, but
it is. Now just derive grown up versions of most
of the main results of elementary
statistics for yourself from what you have
learned. E.g., for the Neyman-Pearson
result on most powerful hypothesis
testing, just use the Hahn decomposition
from the Radon-Nikodym theorem. And, with the Radon-Nikodym theorem, get a
grown up version of sufficient statistics,
right, based on a classic paper by Halmos
and Savage. Along the way will notice that, the last
time I looked, Baby Rudin defined the
Riemann integral on closed intervals of
finite length, but right away probability
and statistics want to integrate on the
whole real line, the whole plane, etc., do
change of variable manipulations with such
integrals, etc. Well, for the
prerequisites, those are in measure theory
where the first version of its integral
applies also on the whole real line, the
whole plane, and much more. Measure
theory also give you the clean, powerful
versions of differentiation under the
integral sign and interchange or order of
integration. Ah, why bother to teach the Riemann
integral at all? :-) |
I have a utilitarian understanding of mathematics (vector spaces, SVD, orthogonality, invariances, etc.) and over time appreciating the underlying characteristics/relationships, which I recently got a taste of from T. Wickens, The Geometry of Multivariate Statistics.
I look forward to understanding measure theory and related math.