| Part I (1) Calculus Generally should have college freshman and
sophomore calculus. (1.1) Functions So, there can understand better what a
function is. E.g., function f(x) = 3x^2 + 1.
(1.2) DerivativesThen will learn how to find the slope of
the graph of a function. That is the
derivative of the function. E.g., for
function f with f(x) = 3x + 2, as in high
school algebra, the slope is 3. Then for
each x, the derivative of f at x is just
3. The derivative of function f is denoted by
either of f'(x) = d/dx f(x)
E.g., for function f(x) = 3x^2 + 1 it turns
out that f'(x) = 6x.
(1.3) IntegrationFor function g(x) = 6 x
maybe we want to know what function f(x)
will give us f'(x) = g(x)
Finding such a function f is
anti-differentiation, that is, undoes
differentiation. So, sure, f(x) = 3x^2 + C
for any constant C.Such anti-differentiation is also the way
to find the area under a curve. So, can
use that to find the area of a circle,
volume of a cylinder, etc. Doing that the
anti-differentiation is integration. The fundamental theorem of calculus shows
how differentiation and integration are
related. (1.4) Analytic Geometry Commonly taught at the beginning of a
calculus course is analytic geometry. So, take a cone an cut it. Then the cut
surfaces will be one of a circle, an
ellipse, a parabola, a hyperbola, or just
two crossed straight lines. So, those
curves are from a cone and are the conic
sections. There is some simple associated algebra. Conic sections are important off and on;
e.g., applied math is awash in circles;
the planets move in ellipses; a baseball
moves in a parabola or nearly so; an
electron moving toward a negative charge
will turn away from that charge in a
hyperbola. It turns out that in linear algebra
(below) circles and ellipses are important. (1.5) Role of Calculus Calculus was invented by Newton as part of
working with force and acceleration for
understanding the motion of the planets. E.g., if at time t function d(t) gives
distance traveled, then function v(t) =
d'(t) is the velocity at time t and
function a(t) = v'(t) is the acceleration
at time t. Then Newton's second law is F(t) = m a(t)
where F(t) is the force at time t applied
to mass m.Calculus is the first approach to the
analysis of continuous change and is a
pillar of civilization. Knowledge of calculus will commonly be
assumed in work in ML/AL, data science,
statistics, optimization, applied math,
engineering, etc. E.g., a lot in ML, AI, and data science is
getting best fits to data; best fitting is
to minimize errors in the fit; such
minimization is mostly a calculus problem;
one of the main steps in ML is steepest
descent, and that is from a derivative. Probability theory (e.g., evaluating coin
tossing, poker hands, accuracy in ML) will
be important in ML/AI, etc.; two of the
basic notions in probability are
cumulative distributions and density
distributions; the cumulative is from an
integration, and the density is from a
differentiation. |
(2) Linear Algebra
(2.1) Linear Equations
The start of linear algebra was seen in high school algebra, solving systems of linear equations.
E.g., we seek numerical values of x and y so that
So, that is two equations in the two unknowns x and y.Well, for positive integers m and n, we can have m linear (linear is in the above example but omitting here a careful definition) equations in n unknowns.
Then depending on the constants, there will be none, one, or infinitely many solutions.
E.g., likely the central technique of ML and data science is fitting a linear equation to data. There the central idea is the set of normal equations which are linear (and, crucially, symmetric and non-negative semi-definite as covered carefully in linear algebra).
(2.2) Gauss Elimination
The first technique for attacking linear equations is Gauss elimination. There can determine if there are none, one, or infinitely many solutions. For one solution, can find it. For infinitely many solutions can find one solution and for the rest characterize them as from arbitrary values of several of the variables.
(2.3) Vectors and Matrices
A nice step forward in working with systems of linear equations is the subject of vectors and matrices.
A good start is just
we saw above. What we do is just rip out the x and y, call that pair a vector, leave the constants on the left as a matrix, and regard the constants on the right side as another vector. Then the left side becomes the matrix theory product of the matrix of the constants and the vector of the unknowns x and y.The matrix will have two rows and two columns written roughly as in
So, this matrix is said to be 2 x 2 (2 by 2).Sure, for positive integers m and n, we can have a matrix that is m x n (m by n) which means m rows and n columns.
The vector of the unknowns x and y is 2 x 1 and is written
So, we can say that the matrix is A; the unknowns are the components of vector v; the right side is vector b; and that the system of equations is where the Av is the matrix product of A and v. How is this product defined? It is defined to give us just what we had with the equations we started with -- here omitting a careful definition.So, we use a matrix and two vectors as new notation to write our system of linear equations. That's the start of matrix theory.
It turns out that our new notation is another pillar of civilization.
Given a m x n matrix A and an n x p matrix B, we can form the m x p matrix product AB. Amazingly, this product is associative. That is, if we have p x q matrix C then we can form m x q product
ABC = (AB)C = A(BC)
It turns out this fact is profound and powerful.
The proof is based on interchanging the order two summation signs, and that fact generalizes.
Matrix product is the first good example of a linear operator in a linear system. The world is awash in linear systems. There is a lot on linear operators, e.g., Dunford and Schwartz, Linear Operators. Electronic engineering, acoustics, and quantum mechanics are awash in linear operators.
To build a model of the real world, for ML, AL, data science, ..., etc., the obvious first cut is to build a linear system.
And if one linear system does not fit very well, then we can use several in patches of some kind.
(2.4) Vector Spaces
For the set of real numbers R and a positive integer n, consider the set V of all n x 1 vectors of real numbers. Then V is a vector space. We can write out the definition of a vector space and see that the set V does satisfy that definition. That's the first vector space we get to consider.
But we encounter lots more vector spaces; e.g., in 3 dimensions, a 2 dimensional plane through the origin is also a vector space.
Gee, I mentioned dimension; we need a good definition and a lot of associated theorems. Linear algebra has those.
So, for matrix A, vector x, and vector of zeros 0, the set of all solutions x to
Ax = 0
is a vector space, and it and its dimension are central in what we get in many applications, e.g., at the end of Gauss elimination, fitting linear equations to data, etc.
(2.5) Eigen Values, Vectors
Eigen in German translates to English as special, unique, singular, or some such.
Well, for a n x n matrix A, we might have that
Ax = lx
for number l. In this case what matrix A does to vector x is just change its length by l and keep its direction the same. So, l and x are quite special. Then l is an eigenvalue of A, and x is a corresponding eigenvector of A.
These eigen quantities are central to the crucial singular value decomposition, the polar decomposition, principal components, etc.
(2.6) Texts
A good, now quite old, intermediate text in linear algebra is by Hoffman and Kunze, IIRC now available for free as PDF on the Internet.
A special, advanced linear algebra text is P. Halmos, Finite Dimensional Vector Spaces written in 1942 when Halmos was an assistant to John von Neumann at the Institute for Advanced Study. The text is an elegant finite dimensional introduction to infinite dimensional Hilbert space.
At
http://www.american.com/archive/2008/march-april-magazine-co...
is an entertaining article about Harvard's course Math 55. At one time that course used that book by Halmos and also, see below, Baby Rudin.
For more there is
Richard Bellman, Introduction to Matrix Analysis.
Horn and Johnson, Matrix Analysis.
There is much more, e.g., on numerical methods. There a good start is LINPACK, the software, associated documentation, and references.
(5) More
The next two topics would be probability theory and statistics.
For a first text in either of these two, I'd suggest you find several leading research universities, call their math departments, and find what texts they are using for their first courses in probability and statistics. I'd suggest you get the three most recommended texts, carefully study the most recommended one, and use the other two for reference.
Similarly for calculus and linear algebra.
For more, that would take us into a ugrad math major. Again, make some phone calls for a list of recommended texts. One of those might be
W. Rudin, Principles of Mathematical Analysis.
aka, "Baby Rudin". It's highly precise and challenging.
For more,
H. Royden, Real Analysis
W. Rudin, Real and Complex Analysis
L. Breiman, Probability
M. Loeve, Probability
J. Neveu, Mathematical Foundations of the Calculus of Probability
The last two are challenging.
For Bayesian, that's conditional expectation from the Radon-Nikodym theorem with a nice proof by John von Neumann in Rudin's Real and Complex Analysis.
After those texts, often can derive the main results of statistics on your own or just use Wikipedia a little. E.g., for the Neyman-Pearson result in statistical hypothesis testing, there is a nice proof from the Hahn decomposition from the Radon-Nikodym theorem.