| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sergiosgc 4448 days ago

> The only difference between a transformation matrix and a neural network is that a neural network has at least two layers. In other words, it is two (or more) transformation matrices bolted together. For reasons that are a bit too complex to get into here, allows an NN to perform more complex transformations than a single matrix can. In fact, it turns out that an arbitrarily large NN can perform any polynomial-based transformation on the data.

Nice explanation. I need one clarification, though. Isn't matrix multiplication associative? Isn't thus any transformation defined by two matrices representable by a single matrix that is the product of the two matrices?

I am probably misunderstanding how NNs bolt matrices together.

3 comments

tfgg 4448 days ago

You apply a non-linear function (usually some sigmoid) on the output vector after each matrix product. Otherwise, you'd be correct and any multi-layer ANN could be expressed as a single layer network.

link

sergiosgc 4447 days ago

Thanks. It makes sense. The sigmoid is the activation function of the output "neuron". Unfortunately, matrix algebra here is not as useful as in computer graphics.

link

tfgg 4447 days ago

No problem. Actually, I personally found that a pretty intuitive understanding of linear algebra & vector calculus makes quite a lot of ML straight forward to approach geometrically.

link

joe_the_user 4448 days ago

Well,

I suspect some kind of transformation could be used to make a two level NN into a one level one. The thing is the resulting one level network might be more complex and less useful than the original two level network. Still, I think this does illustrate the limitations of multilevel networks.

Another way to see this is to notice that NNs and SVMs[1] are (approximately or exactly) equivalent [2] because they both involve the fairly simple linear and non-linear transformations we've been looking at.

[1] http://en.wikipedia.org/wiki/Support_vector_machine [2] http://www.staff.ncl.ac.uk/peter.andras/PAnpl2002.pdf

link

dwiel 4448 days ago

Interesting to note though that even with a linear network that can be represented by a single matrix, it can be faster, easier and converge to better results with multiple layers because the different gradient and parameter space that is presented to the optimization algorithm.

link