Hacker News new | ask | show | jobs
by bluetwo 3190 days ago
I know it is popular to say that these techniques are based on how the brain works, but when I read about them, I have my doubts.

Can anyone take a real world example of human behavior and show me how it relates to how these techniques predict humans will behave?

I love the field but feel like there is a temptation to take giant leaps not supported by other observations.

4 comments

We say ANNs are "based on how the brain works" because the original mathematical model was an attempt by McCulloch and Pitts to explain how complex behavior arises from networks of simple neurons.

A neuron is either activated or not and each of the many inputs can be either excitatory (encourages activation) or inhibitory (discourages activation). McCulloch and Pitts formalized this as a weighted average of the inputs that was then thresholded to 0 or 1. And they showed some basic theoretical results from that that gave it some credit as a model for how intelligence can arise from neurons. Essentially they said behavior can be described as a classifier.

AFAIK, they didn't go much into how the weights were actually learned. Different strategies were tried, but we ultimately started to soften the threshold function into the logistic function (to make the network differentiable) and solve for the weights by gradient descent.

Modern Deep Learning makes the additional assumption that neurons in the same layer are not interconnected. This assumption, along with the fact that we're just dealing with weighted averages, allows us to describe networks in matrix form, allows us to compute the gradients with backprop, and allows efficient simulation on the GPU. This assumption is more practical than biological.

> show me how it relates to how [...] humans will behave?

[This page][1] attempts to connect the dots between the McCulloch and Pitts model, the resulting classifiers, and behavior. Essentially, the theory was that neurons can be formalized into classifiers, and behavior is just the output of these classifiers. I don't know too much about modern neuroscience, but given the amazing results we are seen these days in vision, language, and planning, I'd say the central ideas of the theory are still credible.

[1]: http://www.mind.ilstu.edu/curriculum/modOverview.php?modGUI=...

First of all, neurons don't have just one activation function. Each dendrite has. So, anything from dozens to thousands. Second, that definition doesn't cover the entire issue of multiple feedback loops. Third, this doesn't cover memory effects at structural (cytoskeleton) and local levels (vesicles), much less generic levels (RNA and your genes). And then we haven't even gotten into metabolomic and epigenetic wriring in your neurons ...

Calling those chained regressions similar to the brain is about as correct as saying that a 3y old's drawing of a car is similar to a real Tesla...

I mean, doi.

McCulloch and Pitts published in the 1950s. Of course we know more about the brain now.

If I were to ask you "How does intelligence arise from a network of activations?" Would you genuinely say that it has nothing to do with the McCulloch and Pitts theory?

I would honestly say we really have no clue, and maybe add that as far as we can tell, activations play as much of a role in intelligence as a myriad of other factors.

But more generally, I am just so tied of this "brain metaphor" on deep learning. It is a funny way to wake up your students (well, at least 10 years ago it was...), but trying to stretch this metaphor much more than that is just painful. Heck, even the activating "functions" (plural, as we now know) in a neutron isn't really a set (!) of (singular, independent) functions, it's just a top level name for a mind-boggling number of things happening as neurons "fire", with a mathematical formalism to approximate what's going on. In fact, calling an activation a "function" is probably belittling the biological processes behind them.

Inspired by biology is typically a better way to think about it. Airplanes have wings inspired from biological birds, and they share some structural similarities, but in practice they serve very different functions.
I would even say that this is somewhat revisionist history. From my perspective, this all started from an attempt by Kolmogorov to solve Hilbert's 13th problem:

https://en.wikipedia.org/wiki/Hilbert%27s_thirteenth_problem

Kolmogorov authored a paper titled "On Representation of Continuous Functions of Several Variables by Superpositions of Continuous Functions of Smaller Number of Variables," that basically solved this in 1961. This led to a nice back and forth series of papers between Kolmogorov and Arnold, but the one that becomes more important is Kolmogorov's paper, "On the Representation of Continuous Functions of Many Variables by Superposition of Continuous Functions of One Variable and Addition," in 1963. What this paper proves is that any continuous function defined on the n-dimensional unit cube can be represented by the superposition of 2n one dimensional continuous functions:

https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Arnold_repr...

Now, the problem with this theorem is that it doesn't say how to find these magical 2n functions. However, in 1989 Cybbenko published the paper, "Approximation by Superpositions of Sigmoidal Functions," which both extends and weakens the above result. Basically, he loses the 2n bound, but gives a way to construct these functions by using a linear projection inside of a superposition of sigmoids. This led to the universal approximation theorem:

https://en.wikipedia.org/wiki/Universal_approximation_theore...

and I would contend the underpinnings for the modern neural net models. Now, is there any biology in there? No. It's a long series of function approximation papers. That said, I don't know the authors involved or what inspired them to write these papers. However, given that we have a documented history of dry function approximation papers that give us the mathematical power that we need to begin to justify these models, I tend to feel that the biological connections are oversold.

That timeline seems to miss that Yann LeCun was already working on ConvNets in 1988. I don't think anyone waited for the Universal Approximation theorem to start building neural architectures, it was just a tangentially interesting mathematical result.
Which paper are you speaking about? Certainly, I'm always interested in a more complete history. I'm currently on LeCun's page and can't figure out which paper you're speaking to:

http://yann.lecun.com/exdb/publis/index.html

More generally, a common trope in NN papers and books is to draw a graph for matrix-vector multiplication and then draw the analogy that these are like neurons in the brain and this represents their connectivity. This is an example of the kind of backwalking biological analogies that frustrate me. Again, certainly, I don't know the motivations behind everyone in the field, but I do contend that many of the more powerful theorems have nothing to do with biology and have other origins.

It would probably be more accurate to say "these technique are based on how we thought the brain works". For a historical summary, it's not the worst sin in the world.
About this topic (relationship between ML and what the brain really does), I've been reading about this interesting advancement today: https://www.quantamagazine.org/new-theory-cracks-open-the-bl...