Hacker News new | ask | show | jobs
by robert_tweed 4400 days ago
This is a pretty good article, but I'm seeing a lot of confusion in this thread because the article is maybe one step ahead of the basic intuition needed to understand why ANNs are not magical and are not artificial intelligence (at least not feed-forward networks).

Perhaps a simpler way to look at it is to understand that a feed-forward ANN is basically just a really fancy transformation matrix.

OK, so unless you know linear algebra, you're probably now asking what's a transformation matrix? Without the full explanation, the important understanding is why they are so important in 3D graphics: they can perform essentially arbitrary operations (translation, rotation, scaling) on points/vectors. Once you have set up your matrix, it will dutifully perform the same transformations on every point/vector you give it. In graphics programming, we use 4x4 matrices to perform these transformations on 3D points (vertices) but the same principle works in any number of dimensions - you just need a matrix that is one bigger than the number of dimensions in your data*.

Edit: For NNs the matrices don't always have to be square. For instance you might want your output space to have far fewer dimensions that your input. If you want a simple yes/no decision then your output space is one-dimensional. The only reason the matrices are square in 3D graphics is because the vertices are always 3-dimensional.

What a neural network does is take a bunch of "points" (the input data) in some arbitrary, high number of dimensions and performs the same transformation on all of them, so as to distort that space. The reason it does this is so that the points go from being some complex intertwining that might appear random or intractable, into something where the points are linearly separable: i.e., we can now draw a series of planes in between the data that segments it into the classifications we care about.

The only difference between a transformation matrix and a neural network is that a neural network has at least two layers. In other words, it is two (or more) transformation matrices bolted together. For reasons that are a bit too complex to get into here, allows an NN to perform more complex transformations than a single matrix can. In fact, it turns out that an arbitrarily large NN can perform any polynomial-based transformation on the data.

The reason this is often seen as somewhat magic is that although you can tell what transformations a neural network is doing in trivial cases, NNs are generally used where the number of dimensions is so large that reasoning about what it is doing is difficult. Different training methods can give wildly different networks that seemingly give much the same results, or fairly similar networks that give wildly different results. How easy it is to understand the various convolutions that are taking place rather depends on what the input data represents. In the case of computer vision it can be quite easy to visualise the features that each neuron in the hidden layer is looking for. In cases where the data is more arbitrary, it can be much harder to reason about, so if your training algorithm isn't performing as you'd like, it can be difficult to understand why it isn't working, even if you already understand that the basic principle of a feed-forward network is just a bunch of simple algebra.

3 comments

> The only difference between a transformation matrix and a neural network is that a neural network has at least two layers. In other words, it is two (or more) transformation matrices bolted together. For reasons that are a bit too complex to get into here, allows an NN to perform more complex transformations than a single matrix can. In fact, it turns out that an arbitrarily large NN can perform any polynomial-based transformation on the data.

Nice explanation. I need one clarification, though. Isn't matrix multiplication associative? Isn't thus any transformation defined by two matrices representable by a single matrix that is the product of the two matrices?

I am probably misunderstanding how NNs bolt matrices together.

You apply a non-linear function (usually some sigmoid) on the output vector after each matrix product. Otherwise, you'd be correct and any multi-layer ANN could be expressed as a single layer network.
Thanks. It makes sense. The sigmoid is the activation function of the output "neuron". Unfortunately, matrix algebra here is not as useful as in computer graphics.
No problem. Actually, I personally found that a pretty intuitive understanding of linear algebra & vector calculus makes quite a lot of ML straight forward to approach geometrically.
Well,

I suspect some kind of transformation could be used to make a two level NN into a one level one. The thing is the resulting one level network might be more complex and less useful than the original two level network. Still, I think this does illustrate the limitations of multilevel networks.

Another way to see this is to notice that NNs and SVMs[1] are (approximately or exactly) equivalent [2] because they both involve the fairly simple linear and non-linear transformations we've been looking at.

[1] http://en.wikipedia.org/wiki/Support_vector_machine [2] http://www.staff.ncl.ac.uk/peter.andras/PAnpl2002.pdf

Interesting to note though that even with a linear network that can be represented by a single matrix, it can be faster, easier and converge to better results with multiple layers because the different gradient and parameter space that is presented to the optimization algorithm.
A nice, cogent explanation.

It's good to remember the ANN's input offset comes as vector data. The ANN isn't directly transforming those vectors directly, rather it is transforms these input to a higher dimensional "feature" space and performs the linear transform. If you take the separating plane that's drawn in the feature space and reverse the map, you'll the ANN has drawn complex surface between the points it want to recognize and those it rejects.

So it's basically a heuristic and no more intelligent than Taylor's series.

So, IIUC creating a NN basically follows this process:

- define an input vector space (i.e choose dimensions you want to operate on with input data)

- define your categories in another space (or another basis in the same space?)

- set up a transformation pipeline between the two spaces (with at least two stages)

- devise an algorithm that takes categorised elements and produces new transformation matrices

- train the NN (i.e feed input and categorise the result so that through some algorithm the transformation matrices converge)