| HN Mirror

Part II

(2) Linear Algebra

(2.1) Linear Equations

The start of linear algebra was seen in high school algebra, solving systems of linear equations.

E.g., we seek numerical values of x and y so that

     3 x - 2 y = 7

     -x  + 2 y = 8

So, that is two equations in the two unknowns x and y.

Well, for positive integers m and n, we can have m linear (linear is in the above example but omitting here a careful definition) equations in n unknowns.

Then depending on the constants, there will be none, one, or infinitely many solutions.

E.g., likely the central technique of ML and data science is fitting a linear equation to data. There the central idea is the set of normal equations which are linear (and, crucially, symmetric and non-negative semi-definite as covered carefully in linear algebra).

(2.2) Gauss Elimination

The first technique for attacking linear equations is Gauss elimination. There can determine if there are none, one, or infinitely many solutions. For one solution, can find it. For infinitely many solutions can find one solution and for the rest characterize them as from arbitrary values of several of the variables.

(2.3) Vectors and Matrices

A nice step forward in working with systems of linear equations is the subject of vectors and matrices.

A good start is just

     3 x - 2 y = 7

     -x  + 2 y = 8

we saw above. What we do is just rip out the x and y, call that pair a vector, leave the constants on the left as a matrix, and regard the constants on the right side as another vector. Then the left side becomes the matrix theory product of the matrix of the constants and the vector of the unknowns x and y.

The matrix will have two rows and two columns written roughly as in

   /         \
   |  3  - 2 |
   |         |
   | -1    2 |
   \         /

So, this matrix is said to be 2 x 2 (2 by 2).

Sure, for positive integers m and n, we can have a matrix that is m x n (m by n) which means m rows and n columns.

The vector of the unknowns x and y is 2 x 1 and is written

   /   \
   | x |
   |   |
   | y |
   \   /

So, we can say that the matrix is A; the unknowns are the components of vector v; the right side is vector b; and that the system of equations is

     Av = b

where the Av is the matrix product of A and v. How is this product defined? It is defined to give us just what we had with the equations we started with -- here omitting a careful definition.

So, we use a matrix and two vectors as new notation to write our system of linear equations. That's the start of matrix theory.

It turns out that our new notation is another pillar of civilization.

Given a m x n matrix A and an n x p matrix B, we can form the m x p matrix product AB. Amazingly, this product is associative. That is, if we have p x q matrix C then we can form m x q product

ABC = (AB)C = A(BC)

It turns out this fact is profound and powerful.

The proof is based on interchanging the order two summation signs, and that fact generalizes.

Matrix product is the first good example of a linear operator in a linear system. The world is awash in linear systems. There is a lot on linear operators, e.g., Dunford and Schwartz, Linear Operators. Electronic engineering, acoustics, and quantum mechanics are awash in linear operators.

To build a model of the real world, for ML, AL, data science, ..., etc., the obvious first cut is to build a linear system.

And if one linear system does not fit very well, then we can use several in patches of some kind.

(2.4) Vector Spaces

For the set of real numbers R and a positive integer n, consider the set V of all n x 1 vectors of real numbers. Then V is a vector space. We can write out the definition of a vector space and see that the set V does satisfy that definition. That's the first vector space we get to consider.

But we encounter lots more vector spaces; e.g., in 3 dimensions, a 2 dimensional plane through the origin is also a vector space.

Gee, I mentioned dimension; we need a good definition and a lot of associated theorems. Linear algebra has those.

So, for matrix A, vector x, and vector of zeros 0, the set of all solutions x to

Ax = 0

is a vector space, and it and its dimension are central in what we get in many applications, e.g., at the end of Gauss elimination, fitting linear equations to data, etc.

(2.5) Eigen Values, Vectors

Eigen in German translates to English as special, unique, singular, or some such.

Well, for a n x n matrix A, we might have that

Ax = lx

for number l. In this case what matrix A does to vector x is just change its length by l and keep its direction the same. So, l and x are quite special. Then l is an eigenvalue of A, and x is a corresponding eigenvector of A.

These eigen quantities are central to the crucial singular value decomposition, the polar decomposition, principal components, etc.

(2.6) Texts

A good, now quite old, intermediate text in linear algebra is by Hoffman and Kunze, IIRC now available for free as PDF on the Internet.

A special, advanced linear algebra text is P. Halmos, Finite Dimensional Vector Spaces written in 1942 when Halmos was an assistant to John von Neumann at the Institute for Advanced Study. The text is an elegant finite dimensional introduction to infinite dimensional Hilbert space.

http://www.american.com/archive/2008/march-april-magazine-co...

is an entertaining article about Harvard's course Math 55. At one time that course used that book by Halmos and also, see below, Baby Rudin.

For more there is

Richard Bellman, Introduction to Matrix Analysis.

Horn and Johnson, Matrix Analysis.

There is much more, e.g., on numerical methods. There a good start is LINPACK, the software, associated documentation, and references.

(5) More

The next two topics would be probability theory and statistics.

For a first text in either of these two, I'd suggest you find several leading research universities, call their math departments, and find what texts they are using for their first courses in probability and statistics. I'd suggest you get the three most recommended texts, carefully study the most recommended one, and use the other two for reference.

Similarly for calculus and linear algebra.

For more, that would take us into a ugrad math major. Again, make some phone calls for a list of recommended texts. One of those might be

W. Rudin, Principles of Mathematical Analysis.

aka, "Baby Rudin". It's highly precise and challenging.

For more,

H. Royden, Real Analysis

W. Rudin, Real and Complex Analysis

L. Breiman, Probability

M. Loeve, Probability

J. Neveu, Mathematical Foundations of the Calculus of Probability

The last two are challenging.

For Bayesian, that's conditional expectation from the Radon-Nikodym theorem with a nice proof by John von Neumann in Rudin's Real and Complex Analysis.

After those texts, often can derive the main results of statistics on your own or just use Wikipedia a little. E.g., for the Neyman-Pearson result in statistical hypothesis testing, there is a nice proof from the Hahn decomposition from the Radon-Nikodym theorem.