| Part II (2) Linear Algebra (2.1) Linear Equations The start of linear algebra was seen in
high school algebra, solving systems of
linear equations. E.g., we seek numerical values of x and y
so that 3 x - 2 y = 7
-x + 2 y = 8
So, that is two equations in the two
unknowns x and y.Well, for positive integers m and n, we
can have m linear (linear is in the
above example but omitting here a careful
definition) equations in n unknowns. Then depending on the constants, there
will be none, one, or infinitely many
solutions. E.g., likely the central technique of ML
and data science is fitting a linear
equation to data. There the central idea
is the set of normal equations which are
linear (and, crucially, symmetric and
non-negative semi-definite as covered
carefully in linear algebra). (2.2) Gauss Elimination The first technique for attacking linear
equations is Gauss elimination. There can
determine if there are none, one, or
infinitely many solutions. For one
solution, can find it. For infinitely
many solutions can find one solution and
for the rest characterize them as from
arbitrary values of several of the
variables. (2.3) Vectors and Matrices A nice step forward in working with
systems of linear equations is the subject
of vectors and matrices. A good start is just 3 x - 2 y = 7
-x + 2 y = 8
we saw above. What we do is just rip out
the x and y, call that pair a vector,
leave the constants on the left as a
matrix, and regard the constants on the
right side as another vector. Then the
left side becomes the matrix theory
product of the matrix of the constants
and the vector of the unknowns x and y.The matrix will have two rows and two
columns written roughly as in / \
| 3 - 2 |
| |
| -1 2 |
\ /
So, this matrix is said to be 2 x 2 (2 by
2).Sure, for positive integers m and n, we
can have a matrix that is m x n (m by n)
which means m rows and n columns. The vector of the unknowns x and y is 2 x
1 and is written / \
| x |
| |
| y |
\ /
So, we can say that the matrix is A; the
unknowns are the components of vector v;
the right side is vector b; and that the
system of equations is Av = b
where the Av is the matrix product of A
and v. How is this product defined? It is
defined to give us just what we had with
the equations we started with -- here
omitting a careful definition.So, we use a matrix and two vectors as new
notation to write our system of linear
equations. That's the start of matrix
theory. It turns out that our new notation is
another pillar of civilization. Given a m x n matrix A and an n x p matrix
B, we can form the m x p matrix product
AB. Amazingly, this product is
associative. That is, if we have p x q
matrix C then we can form m x q product ABC = (AB)C = A(BC) It turns out this fact is profound and
powerful. The proof is based on interchanging the
order two summation signs, and that fact
generalizes. Matrix product is the first good example
of a linear operator in a linear
system. The world is awash in linear
systems. There is a lot on linear
operators, e.g., Dunford and Schwartz,
Linear Operators. Electronic
engineering, acoustics, and quantum
mechanics are awash in linear operators. To build a model of the real world, for
ML, AL, data science, ..., etc., the
obvious first cut is to build a linear
system. And if one linear system does not fit very
well, then we can use several in patches
of some kind. (2.4) Vector Spaces For the set of real numbers R and a
positive integer n, consider the set V of
all n x 1 vectors of real numbers. Then V
is a vector space. We can write out the
definition of a vector space and see that
the set V does satisfy that definition.
That's the first vector space we get to
consider. But we encounter lots more vector spaces;
e.g., in 3 dimensions, a 2 dimensional
plane through the origin is also a vector
space. Gee, I mentioned dimension; we need a
good definition and a lot of associated
theorems. Linear algebra has those. So, for matrix A, vector x, and vector of
zeros 0, the set of all solutions x to Ax = 0 is a vector space, and it and its
dimension are central in what we get in
many applications, e.g., at the end of
Gauss elimination, fitting linear
equations to data, etc. (2.5) Eigen Values, Vectors Eigen in German translates to English as
special, unique, singular, or some such. Well, for a n x n matrix A, we might have
that Ax = lx for number l. In this case what matrix A
does to vector x is just change its length
by l and keep its direction the same. So,
l and x are quite special. Then l is an
eigenvalue of A, and x is a
corresponding eigenvector of A. These eigen quantities are central to the
crucial singular value decomposition, the
polar decomposition, principal components,
etc. (2.6) Texts A good, now quite old, intermediate text
in linear algebra is by Hoffman and Kunze,
IIRC now available for free as PDF on the
Internet. A special, advanced linear algebra text is
P. Halmos, Finite Dimensional Vector
Spaces written in 1942 when Halmos was an
assistant to John von Neumann at the
Institute for Advanced Study. The text is
an elegant finite dimensional introduction
to infinite dimensional Hilbert space. At http://www.american.com/archive/2008/march-april-magazine-co... is an entertaining article about Harvard's
course Math 55. At one time that course
used that book by Halmos and also, see
below, Baby Rudin. For more there is Richard Bellman, Introduction to Matrix
Analysis. Horn and Johnson, Matrix Analysis. There is much more, e.g., on numerical
methods. There a good start is LINPACK,
the software, associated documentation,
and references. (5) More The next two topics would be probability
theory and statistics. For a first text in either of these two,
I'd suggest you find several leading
research universities, call their math
departments, and find what texts they are
using for their first courses in
probability and statistics. I'd suggest
you get the three most recommended texts,
carefully study the most recommended one,
and use the other two for reference. Similarly for calculus and linear algebra. For more, that would take us into a ugrad
math major. Again, make some phone calls
for a list of recommended texts. One of
those might be W. Rudin, Principles of Mathematical
Analysis. aka, "Baby Rudin". It's highly precise
and challenging. For more, H. Royden, Real Analysis W. Rudin, Real and Complex Analysis L. Breiman, Probability M. Loeve, Probability J. Neveu, Mathematical Foundations of the
Calculus of Probability The last two are challenging. For Bayesian, that's conditional
expectation from the Radon-Nikodym theorem
with a nice proof by John von Neumann in
Rudin's Real and Complex Analysis. After those texts, often can derive the
main results of statistics on your own or
just use Wikipedia a little. E.g., for
the Neyman-Pearson result in statistical
hypothesis testing, there is a nice proof
from the Hahn decomposition from the
Radon-Nikodym theorem. |