|
I'm not sure where the "linear" in "linear algebra" comes from. You hear about
linear algebra in relation with machine learning a lot because training a
neural net (with the backpropagation algorithm and friends) requires some
matrix arithmetic. Inputs to neural nets are vectors or matrices, their
weights are (arrayed in) vectors or matrices, their outputs are - well,
usually scalars but can also be vectors or matrices. Also, the use of linear/ nonlinear in machine learning is a bit misleading. A
"line" is not necessarily a "straight line", but usually when we say "linear"
we mean "straight" and so when we want to say "not straight" we use
"nonlinear". In any case, when we say "line" in machine learning we mean a function, the
function of a line. So a "nonlinear" function is a function that curves and
turns, e.g. a sigmoid, whereas a "linear" function is straight as a rod. Why a line? Classifiers er classify by drawing a line through space. "Space"
means a Cartesian space where our training examples are represented as points
(hence, "data points"). Data points are located in Cartesian space according
to coordinates that represent their attributes, or features (these coordinates
are the "feature vectors" that are input to neural nets). We classify data
points by drawing a line between those that belong to one class and those that
belong to other classes. More to the point, when we train a classifier, we
find the parameters of a function of a line that separates the points of
separate classes and when we want to classify a new point, we look at where it
falls with relation to that line. So that's where all that stuff about lines and "linear" and "nonlinear" models
comes from. A "linear model" or "linear classifier" can only draw straight
lines. A "nonlinear model" can go twirling around madly. Finally, "non-linear" doesn't mean "bad". There are tradeoffs- in particular,
the "bias variance tradeoff" that I hint at in my earlier comment. A linear model
is more limited in what it can represent, but a nonlinear model is less likely
to represent data that it hasn't seen in training. |