Hacker News new | ask | show | jobs
by perl4ever 2197 days ago
I'm not good at math, but I'm confused by the association of AI with non-linear stuff, setting aside the association of non-linear with "bad". I thought ML involved linear algebra or something (says xkcd!) which would presumably be...linear?
4 comments

The inner activation function (AF) of neurons is inherently nonlinear; it has to be in order to solve any problem that is not linearly decomposable (which is basically all of the interesting problems). Often the AF nonlinearity shows up as a thresholding operation following a linear weighted sum, but that's not the only mechanism.

And yet neurons are not "pure" binary thresholders the way logic gates are because you can't take the derivative of a binary function, and you can only do backpropagation on differentiable functions. The compromise neurons make is a "smoothed threshold" or sigmoidal curve which is differentiable but still very nonlinear.

I'm not sure where the "linear" in "linear algebra" comes from. You hear about linear algebra in relation with machine learning a lot because training a neural net (with the backpropagation algorithm and friends) requires some matrix arithmetic. Inputs to neural nets are vectors or matrices, their weights are (arrayed in) vectors or matrices, their outputs are - well, usually scalars but can also be vectors or matrices.

Also, the use of linear/ nonlinear in machine learning is a bit misleading. A "line" is not necessarily a "straight line", but usually when we say "linear" we mean "straight" and so when we want to say "not straight" we use "nonlinear".

In any case, when we say "line" in machine learning we mean a function, the function of a line. So a "nonlinear" function is a function that curves and turns, e.g. a sigmoid, whereas a "linear" function is straight as a rod.

Why a line? Classifiers er classify by drawing a line through space. "Space" means a Cartesian space where our training examples are represented as points (hence, "data points"). Data points are located in Cartesian space according to coordinates that represent their attributes, or features (these coordinates are the "feature vectors" that are input to neural nets). We classify data points by drawing a line between those that belong to one class and those that belong to other classes. More to the point, when we train a classifier, we find the parameters of a function of a line that separates the points of separate classes and when we want to classify a new point, we look at where it falls with relation to that line.

So that's where all that stuff about lines and "linear" and "nonlinear" models comes from. A "linear model" or "linear classifier" can only draw straight lines. A "nonlinear model" can go twirling around madly.

Finally, "non-linear" doesn't mean "bad". There are tradeoffs- in particular, the "bias variance tradeoff" that I hint at in my earlier comment. A linear model is more limited in what it can represent, but a nonlinear model is less likely to represent data that it hasn't seen in training.

- "linear" in "linear algebra" comes from "system of linear equations"

- NN can absolutely represent non-linear functions, and they are based on solving system of linear equations.

- The non-linear function here has nothing to do with the linearity of the system of linear equations used to construct it.

- The two main sources of non-linearity are, (a) the inputs (e.g., an image, or a series of images varying a non-linear fashion), and (b) the activation functions.

The underlying derivatives are linear (like all derivatives) but neural networks' ability to approximate arbitrary non linear functions is one of their biggest strengths.
Yes, so I'm left wondering, when making the association of the math to the badness, how do you decide if the linearity or the non-linearity is the salient part?
Mathematically, you can think of "linear" AI problems as "easy to solve", and non-linear as "difficult". That's part of what the parent means.

Some function being linear means it's easier to guess. If a real world phenomenon is tied to a linear function, then it's easy for AI to guess/approximate.