Hacker News new | ask | show | jobs
by therajiv 3259 days ago
As someone primarily interested in interpretation of deep models, I strongly resonate with this warning against anthropomorphization of neural networks. Deep learning isn't special; deep models tend to be more accurate than other methods, but fundamentally they aren't much closer to working like the human brain than e.g. gradient boosting models.

I think a lot of the issue stems from layman explanations of neural networks. Pretty much every time DL is covered by media, there has to be some contrived comparison to human brains; these descriptions frequently extend to DL tutorials as well. It's important for that idea to be dispelled when people actually start applying deep models. The model's intuition doesn't work like a human's, and that can often lead to unsatisfying conclusions (e.g. the panda --> gibbon example that Francois presents).

Unrelatedly, if people were more cautious about anthropomorphization, we'd probably have to deal a lot less with the irresponsible AI fearmongering that seems to dominate public opinion of the field. (I'm not trying to undermine the danger of AI models here, I just take issue with how most of the populace views the field.)

14 comments

I don't have ML or deep learning background (no Masters or PhD), adding comment from experience with backtesting trading systems. We will collect market data and design algorithms that seem to produce the kind of outcomes we want. Then test on some other data sets which the algorithms have never been applied on. Many iterations later, you can get a decent profitable algorithm. And if the 'holy grail' algo is run in market long enough, eventually there will be severe drawdown and going bust. The quality of the algo and I assume the deep learning model lies in the quality (breadth and depth) of the data, and how honest with himself the person choose to model it. There will be time and again new 'black swan' or edge events happening (remember LTCM), because using machine learning is like using the past to predict the future.

I guess as long as the users' expectations are correct it can be useful in some very specific areas. Referencing the AlphaGo game last year, I was a Go player for more than a decade. But yet AlphaGo's weird move inspires new insights that break the conventional structure / thinking-framework of a Go player. From that angle, I do think that even though DL is somewhat a blackbox, humans can pick up new insights because it explores areas which are normally ridiculous to a human with 'common sense' to explore.

> The quality of the algo and I assume the deep learning model lies in the quality (breadth and depth) of the data, and how honest with himself the person choose to model it.

I've only dabbled with machine-learning here and there for the past 10 years or so, but if there's one thing I've learned so far is that the data behind your ML code (and the way it is structured) is responsible for almost all the success or failure of any given ML algorithm. I have an younger colleague at work who I've started tutoring, and he seems really interested in doing ML work (maybe because of all of the recent hype).

I've tried to emphasize to him several times that ML algorithms come and go and that he should focus a lot of his time on the data itself (from where he intends to collect it? how is it structured? is it reliable? is it "enough"? etc), but it looks that my data-related advice falls on deaf ears every time, he's only interested in me pointing to him the latest cool ML algorithm. I guess he'll live and learn, so to speak.

> I've learned so far is that the data behind your ML code (and the way it is structured) is responsible for almost all the success or failure of any given ML algorithm

Data is indeed a necessary condition but certainly not sufficient. You require a good marriage between engineering features and data to have a good success rate. Learning curves [0] are a good way to understand if your ML algorithm requires more data or better feature engineering.

[0] http://mlwiki.org/index.php/Learning_Curves

Much of the programming with ML has moved towards cleaning, extrapolating and generating the data.

But this type of programing is - miracles- bugfree. We never hear of data-conversion gone wrong, data corrupted or data-mining withou conclusive results here. Obviously such bugs lack the glamour of security bugs.

It's also very difficult to catch these errors. Your trained model just doesn't work as well as it could, but how would you be able to tell?
> focus a lot of his time on the data itself... from where he intends to collect it? how is it structured? is it reliable? is it "enough"?

What's the best books on this subject? I suppose it's a very broad topic and thus more difficult to talk about than a single "neural network" algorithm.

Interested in what part of that you feel needs to be explained in more depth? Not sure reading several books is necessary for explaining data collection and data munging...to me it's definitely something best learned by doing.

work in data analysis/stats

Lots of things are best learned by doing. I just noticed there are dozens of books about machine learning algorithms but none on how to gather data. Of course, both those things can be learned independently, but I think there's room for at least a few books about data gathering considering it's so important for good machine learning results.
Here at Manning (we're publishing Francois Book) have something in our early access program on this now - https://www.manning.com/books/the-art-of-data-usability
This is the domain of statistics, isn't it?
Agreed. AFAIK, only statistics has addressed the question of info sufficiency in data and discriminative power of method. Personally, I think the former is an enormously important subject that isn't addressed well in most ML texts. How much data is necessary to answer a given question in practice? How do you know if your data or method are "good enough"?

From what I've seen, statistics addresses these questions better than CS-taught ML does. CS-based ML is no different from algorithm analysis; it suffers from sensitivity to limits inherent in the data. But ML courses often don't address these limits very rigorously. Yet knowing those limits is all important when effectively mining information at a professional level.

If you can't tell the decision maker what you know and what you don't, your inference/prediction really isn't useful. From what I've seen, statistics addresses this best.

Thanks for sharing your experience. I'm happy that my previous exposure to trading algorithms at least helped me understand more what the experts here are talking about. I believe the output model is only as good as the data (at least for the deep learning branch of ML). If the dataset does not cover data-points which exist in a wider space but in the same domain of the problem, or which haven't yet have a precedent, then we really can't simply assume that it is the algo/model that needs tweaking when shit hits the fan.
This is incredibly true, even with crappy old algorithms you can do A LOT if you have great data.

Recent experience with a company that is building some models based on.. few guys recording few hours of audio and annotating it. I still can't get over the fact that otherwise smart people think this is going to work at all.

> but it looks that my data-related advice falls on deaf ears every time, he's only interested in me pointing to him the latest cool ML algorithm.

So, it seems their learning/planning algorithm fails, even when it is given the right data. That's unfortunate.

Sorry, I can't help but notice that you aren't happy with their brain's algorithm, while talking about importance of data. I don't say that data doesn't matter or anything. Just random observation.

Could actually be their data, right? Imagine if you had only had experience with software engineering. The only data you use when engineering software are the data you learn when using the product or writing tests, it's all the algorithms behind it that's important. So to them, they just don't have data on situations where the data are important.

Wow that's confusing wording. I hope it makes sense.

It does, but the algorithm doesn't seems to be state-of-the-art, it's more like current ML algorithms, which need lots of data to work successfully in each new domain. Well, there's a lot of improvement possibilities, at least.
The data processing inequality says processing data does not increase its information content.
But processing does increase the "obviousness" of the information content.

E.g. projecting the data onto independent dimensions doesn't change the information it contains, but it highlights that those dimensions are indeed independent. Decomposing a multimodal distribution into a mixture of unimodal distribution gives more insight than just viewing it as a bunch of data mushed together. And so on.

I think there should be a branch of information theory that quantifies the obviousness of information and how it is changed by various data processing methods.

The "creative" moves may very well come from the search part of the AlphaGo algorithm, though of course the networks have done their jobs of pruning the search space.
I see.. That's true. Though credit still goes to the algo for choosing that particular weird move out of the entire search space (it's just 'weird' and something you will think is a move made by a total newbie to the game). I remembered for that whole week during lunchtime I would watch the broadcast live on YouTube. How devastated I was to see Lee Sedol losing match after match. It was a moment I would never forget, in my mind the computer had crossed an imaginary threshold and it won. I know ML/DL experts will say it is only for a very specific area. But what's stopping more mastery of enough 'specific' areas that the mastery will be broad enough to pass Turing tests?
Careful, that's the sort of thinking that led to the last 'AI Winter': assuming that if enough rule-based expert systems were built, general-purpose systems could be assembled from them and/or enough could be learned to build general-purpose systems.

Now, it is worth noting that DL models are already being assembled together (often with a coordinating DL model to switch between them). This can have the advantage of the smaller models being reusable to some extent (certainly more than expert systems ever were) but is not a panacea. The results are still essentially bespoke models rather than general purpose ones.

Deep Learning obviously has a lot more mileage left in it, given that much human mental labor is 'just' training and using our general-purpose intellects for what amount to a series of rather narrowly defined tasks, but it won't surprise me if there is a wall of some sort lurking just over the horizon that will require a different approach (albeit one that may still be called 'deep learning') to cross.

OTOH, it does seem as though the folks at DeepMind are fairly aggressively pursuing whatever is on the other side of that particular horizon:

https://deepmind.com/blog/neural-approach-relational-reasoni...

https://deepmind.com/blog/cognitive-psychology/

https://deepmind.com/blog/imagine-creating-new-visual-concep...

We can debate, but I don't think another AI winter will happen again in my lifetime. AI work is just earning way too much money for its funding to get cut, and a lot of funding is currently private too.
I wasn't arguing for another AI Winter per-se. My warning was more along the lines of pointing out a potential personal "career winter".
I'd be surprised to see inductive learning anytime soon. But I definitely see the next generation of AI systems, robots and their implementation across industry. But that will rapidly fill out and then we will still be left with self determination.
My understanding is that innovation comes from reinforcement learning during self-play (rather than supervised learning of pro games), and thus goes against the best moves suggested by AlphaGo's policy network, in turn pushing it towards new options.

In a sense, it seems innovation arises when the value network forces the policy network to expand the search space because an apparently unlikely move leads to downstream positions deemed favorable.

It's not that simple. The creativity is that the combination of rollouts, policy and value networks allow for more efficient traversal of the search space. Which gets you better exploration of possible paths, meaning more options than a human considered and therefore more creativity.
> Pretty much every time DL is covered by media, there has to be some contrived comparison to human brains

Well, what we've done so far is emulate maybe 1 mm^3 of brain matter - some isolated, very specialized functional blocks in the greater architecture of the brain. They behave as expected - are experts on very narrow topics, but of course fail to integrate their functioning with a larger body of knowledge, because that body just isn't there (yet).

The strength of the human mind is that is has this profusion of little subject matter experts all over the place, covering an enormous array of topics - and then it has an intricate superstructure that integrates the outputs of these narrow expert machines, tweaks their functioning, even subtly alters their inputs, providing coherence to the global output according to the capabilities of the whole system.

We're still far from that complex high level architecture.

> Well, what we've done so far is emulate maybe 1 mm^3 of brain matter - some isolated, very specialized functional blocks in the greater architecture of the brain. They behave as expected - are experts on very narrow topics, but of course fail to integrate their functioning with a larger body of knowledge, because that body just isn't there (yet).

I think you're falling into the same anthropomorphism trap that the GP is talking about. We haven't even breached the most important topic: neural plasticity - a brain's ability to rewire itself based on a complex feedback loop driven by environmental inputs (which are, at this point in human development, an almost infinitely more complex system of culture built up over tens of thousands of years). From my work in neuroscience, it seems that the computational complexity of the state of the art DL algorithms barely register when compared to a network of a few hundred biological neurons like the nervous system of Caenorhabditis elegans, which is itself far less capable of self reorganization than even the simplest mammalian brain. Hell, even the most basic potentiation that you'd find in decades old research on addiction is far outside the scope of modern machine learning research and we don't yet have any clean mathematical theories that can emulate plasticity like back propagation or gradient descent can with simple learning.

The current hype around neural networks is the equivalent of saying that we've analytically solved the n-body problem when all we've done is solve a system of equations with two linear variables. The domains are connected but only in the trivial sense that both have variables named "x" and "y."

I think you're far too eager to look for and criticize anthropomorphism - hence you see it where it's not.
You said "what we've done so far is emulate maybe 1 mm^3 of brain matter," comparing computational neural networks to us, a biological system - that's literally anthropomorphising.
You seem to be under the assumption that a typical feedforward DNN is anywhere close to operating like the brain, just on a smaller scale. But that assumption is not correct.

Both the brain and artificial neural networks are connectivist, but that's about where the similarities end. The brain uses completely unknown algorithms and mechanisms that are almost certainly very different from our (current) ANNs. So it's not just a matter of increasing the scale.

That is nowhere near what I am saying.
I think it would help a lot if we brought random forests and SVMs to the same level of performance as DNNs. Demonstrating that more "mechanical" algorithms can be as efficient would dispel some of the anthropomorphism and allow for better analysis of why certain things work.

I also believe that researches have responsibility to outline the limits of their own algorithms in research papers. (For example, presenting examples that aren't recognized or data sets on which the approach doesn't work at all.) That is valuable information and they almost certainly have it at the time of publication.

Not possible, unfortunately
I've occasionally found that SVM's work great for one shot learning if you have good features and nicely labelled dataset. CNN's are really good at extracting features. Once you've extracted features that are generic, using an SVM as the last layer to train while keeping the CNN parameters intact yields great accuracy.

I think that's where we are really headed. A combination of deep learning, boosted trees, svm, evolutionary algos, knowledge graphs e.t.c all stitched together to build stronger AI systems.

Remember our aeroplanes don't flap wings but still carry tonnes of weight and fly half way around the world. Once we discovered fundamentals of aerodynamics a lot of supernatural things were possible.

Same with intelligence, once we discover the essentials of intelligence and mathematically formulate it, supernatural intelligence is very possible. This is the thing that really scares people. I have no idea how close we are to it, but I'm sure it will change society the way internet and mobile phones changed the world.

Wow, I had never considered superintelligence that wasn't at least at some level modeled after the human brain. That is crazy to think about. We could be at the very low end of the spectrum of intelligence I guess.
Homo sapiens is the dumbest creature able to spawn a civilization that evolution could produce.

--Accelerando

That's a good comment, and yes, SVM are very powerful in itself, they might not be "deep learning" but they're more powerful than linear learning and good for a lot of cases (as a last layer, as you mentioned, it's a good use case)

Yes, we'll have GAs building CNN architectures, or a mix of several techniques, I'm enthusiastic for what the future holds

> I'm sure it will change society the way internet and mobile phones changed the world.

It will change the entire world the way humans changed the world. And that's scary.

Kaggle has already proven hundreds of times over that deep learning is not a silver bullet.
Thanks, I'm familiar with Kaggle and how most of the time a Random Forest (or XGBoost, or something like Vowpal Wabbit) will solve your problem
True - until some clever guy proves us all wrong and finds ways to train some multidimensional/complex/deep/... kernel/forest/swarm/... that can learn those nonlinearities that currently only deep nets can be trained to detect (essentially, due to their relative simplicity, I'd say) :-)
I don't think we'll see a deep svm, but if we see one I think we'll have something very powerful

Same for a deep decision tree (forest?). Or maybe a combination of several techniques, etc

Probably comes down to whether the model can be trained with gradient descent (at least in the short term).

A general pre-trained RL guided architecture search (#1) together with more choices of nonlinearity (#2), feature extraction (#3), pooling and memory argumentation (#4) and other tricks (#5) could be very powerful amongst many domains. Make it be able to accept multiple pre-trained models as priors and we're well on our way to general AI or at least a place where most data-scientists could be automated away.

(#1 deepmind had a demo a year back or so that was quite novel) (#2 vaguely remember someone training decision trees with gradient descent; could definitely see a 'randomforest' layer appearing in the middle of deep nets) (#3 just convolutions + tricks really). (#4 neural turing machine etc) (#5 any attention mechanism/any sequence mechanism (rnn/lstm etc)/ any graph relational understanding like the recent deepmind paper).

One of the greatest clear and present dangers of AI is that various existing algorithms are called just that, rather than what they are: statistical analysis algorithms, or, in short, statistics. Statistics used to be what we called the worst kind of lie; now it's becoming associated with intelligence, hinting at the ability to expose some great hidden truth. The problem lies not only with the algorithms, but with the models they learn (which are indirectly shaped by the algorithms' limitations) that are simplistic to begin with. E.g., they are trained to predict behavior based on a snapshot of statistical data, using either a constant model (which assumes behavior doesn't change over time) or some simplistic first-order model of change. They certainly aren't usually trained to take into account long-term changes or how their own recommendations impact behavior. The result is a powerful yet completely unjustified boost to the public image of statistical data with simplistic change models.
This. I still cannot forget the disappointment of my parents and some family friends, all retired scientist or MDs, when I explained them how deep learning and natural language processing works a few years ago. They were truly upset that all this was "nothing more than clever accounting and statistics" at the end of the day, and no trace of the "advertised intelligence" - with Hinton's RBMs maybe coming closest, but by the time I was explaining how you use MCMC to train a Boltzmann machine, they again were complaining that even this is just modeling "statistical likelihoods, not true intelligence"...

In essence, we are only modeling patterns and their transformations, even if rather complex ones. But even the most basic prokaryote can model patterns, that has nothing to do with intelligence or consciousness per se. (And please don't get me started on swarm intelligence now... :-))

Perhaps the problem simply lies in calling them neural networks.
This terminology goes back to McCulloch and Pitts in 1943, who said they were making an analogy or model based on the behavior of biological neurons.

https://en.wikipedia.org/wiki/Artificial_neuron#History

There are many things that are inexact about this analogy or model, and many of them were known to be inexact in 1943, but that was the direct inspiration.

Apparently there are lots of different mathematical models available about biological neuron behavior:

https://en.wikipedia.org/wiki/Biological_neuron_model

turns out it's very hard to model a thing that we don't know how it actually works
To be fair, we do understand how neurons work, at least on a singular level. Perceptrons model that quite well.
Implementing a basic perceptron classifier is an undergrad homework assignment. Biological modeling of neurons is a work of decades:

http://www.genesis-sim.org/

https://www.neuron.yale.edu/neuron/what_is_neuron

McCulloch's argument was that perhaps the gross behaviour of a NN as layers of simple transfer functions is where the real action is, and the rest of the details are just gravy.

The fact we now give this to undergrads as homework suggests that there was some value to this idea.

But how does a neuron decide to grow new axons or how to change input weights? Biological neurons does this when solving tasks and not just during training. Isn't it possible that human-like intelligence depends on the network being dynamic? For example, when you play a game for the first time a lot of things suddenly starts to click, couldn't that be the result of new connections forming or at least some weights being changed? If this is true then it would be impossible to create a general game playing AI with human-like performance using our current model.
Biological neurons are fundamentally different to models used in deep learning. They can have multiple outputs, can span over whole brain and do local protein-based computations we don't really understand yet. What we have in perceptron is just a very simple model based on what we observed using rudimentary electricity detectors.
Don't fully connected layers do exactly what you describe?
As well as the title "artificial intelligence".
One can say that the human mind consist of millions of not very special parts. It's the aggregate, the complexity of which they interact that makes it special.

Once you start to connect all these seemingly non-special abilities in deep learning the "magic" starts to happen. You get something that is more than the sum of it's parts. Of course it's not DL in itself thats interesting but the potential emergent complex relationships.

That's just another version of the trap GP spoke about. About a decade ago everybody was expecting emergent complex behavior from all kinds of evolutionary, intelligent ("swarm") systems. Didn't happen, seen that.

https://en.m.wikipedia.org/wiki/Swarm_intelligence

About a decade ago winning GO or self-driving cars were seen as pipedreams many decades away. Yet here we are.

The author is making the mistake of thinking that just because he can show some areas were we aren't as far as we thought he has made an argument against AI.

Thats not how it works. We don't get to decide what is the right metrics. All we can see is that we keep making progress sometimes large leaps sometimes slow.

I always find it fascinating that we have no problem accepting the idea that human consciousness evolved from basically nothing but the most elementary building blocks of the universe and once we became complex enough we ended up being conscious yet somehow the idea of technology going through the same just in a different media seems to many impossible.

I know where my bet is at least and I haven't seen anything to counter that neither the OP's essay.

The fallacy there is glorifying consciousness. Full consciousness as in omniscence is an unachievable ideal. If we prescribe consciousness to ourselves, depending on the individual theory of conscious thought, that's likely faulty in some respect already.
I don't see anyone glorifying consciousness especially not as some omniscient ideal. In fact I only see people arguing that consciousness isn't really the goal or the focus here but rather that you can't talk with any certainty about whether or not it's possible. You can however point to the fact that we are making progress towards more and more complex relationships and that this looks very much like how we became conscious. Thats all really.
>Didn't happen,

...yet. See my comment here: https://news.ycombinator.com/item?id=14770230

"Never" is a strong prediction. But yes, ANNs have nothing in common with BNNs (biological ... :-)) at all, other than taking them as a very rough abstraction for teaching the basic intuition of the chained up tensor transformations.

The hard thing is to predict the when, or even if, of AI. If it will happen, it will be a sudden, light-switch like moment. I don't think AI can happen gradually. At least the first artificially scentient entity will be a moment much like a singularity some love to predict in the near future...

But as to when that moment will occur, or even if, I think we have no real data that shows we are any closer today than say 10 or 30 years ago. Pattern matching, no matter how complex, isn't "all there is" to intelligence and conciseness.

EDIT: OP changed his reply from "will never happen" to "hasn't happened yet" while I was replying, explaining why mine might read a bit strange now... :-)

> If it will happen, it will be a sudden, light-switch like moment. I don't think AI can happen gradually.

But our own intelligence happened gradually.

Human intelligence at the individual level evolved pretty gradually, but there hasn't been enough time for biology to explain our advancement in the last 10,000 years or 500 years. Culture and social organization are the essential nurturing factors there.

Every human genius would be out foraging for roots, perhaps reinventing the wheel or the lever, if it grew up without the benefit and influence of a society that makes greater achievement possible. Modern science and high technology that we attribute to human intelligence are really the products of a superintelligence (not to be conflated with consciousness) acting through us as appendages.

I think it's entirely possible (even likely) that all of the components of a new computational superintelligence already exist, but they are still "hunting and gathering" in the halls of academia or the stock market or biotech or defense...

Has anyone been able to do this? Is anyone working on it?

I only follow the field as a hobby, but as far as I can tell we are nowhere near getting to this point. I think the ability to combine all these parts in a way that the sum is greater than it's parts is going to require many many breakthroughs still.

The thing is that it's most likely not something anyone does per se but something that happens with enough complexity.

If you happen to believe evolutionary theory is the most convincing then we weren't built either but a byproduct of emergent complexity.

It is my belief that humans are pattern recognizing feedback loops and carriers of information. We externalized some of that into books and built libraries to be able to keep even more than humans can remember as individuals and now have technology to save even more information and even manipulate it in ways impossible up until 80 years ago or so.

I am fairly certain that a technology is part of nature and that technology based conscience is nothing like our limited conscience but something rather different. The end result will not be like humans just better but nothing like humans at all but much better at the carrying of information part.

And so with that (my personal belief) perspective in mind no one is going to be able to do it it will happen as a by-product.

Please keep in mind that I saying "we exeternalized" in the same way we say "selfish genes" it's not a conscious effort as such but rather something which happen to be favorized in the game of life.

Why that is I have no idea but I am fairly certain humans aren't the last species. But yes it's all very speculative I just haven't been able to find better explanations for now.

The problem is that we don't really know for sure. We kind of predict things by extrapolating what we know and what we have, but we can never be sure there won't be any sudden breakthroughs.
I agree with your position. But I want to add a warning against the humanization of the brain. Many parts of it are complex in unknown ways, but some parts are truly mechanical.

The parts of your central nervous system that respond to reflexes, that locate the source of sound or parse the color of retinal input are far more similar to deep learning algorithms than they are to what we think of as human consciousness.

Because that has nothing to do with consciousness... Every living cell can perceive such inputs, even the simplest of prokaryotes can "sniff" out their food sources.
> Many parts of it are complex in unknown ways, but some parts are truly mechanical.

I feel like this is a bit of a false dichotomy. We've never encountered any spooky non-mechanical non-physical part of the brain, and we've been looking since Cartesian dualism was in vogue.

What we think of as human consciousness is likely just a bunch of feedback loops allowing the brain to analyze some of its own state as if it were an external entity.

The same oversimplification could have been made of the visual system before we became aware of specialized cortical units and their federated/hierarchical arrangement.

In time I suspect we'll yet discover that much of the brain is inhomogeneous in unexpected ways and peculiarly interconnected. If it were not, we'd understand more about how it works by now.

It isn't anthropomorphizing. There are undeniable architectural similarities between ANN's and biological neural networks. We don't understand either very well yet but the parts we do understand have led to a lot of cross pollination. I don't think computational intelligence will ever match biological networks detail by detail due to the different substrates and resource usage tradeoffs, and they don't need to match. Intelligence can develop in different ways and we are learning about the universal aspects of it.
This is exactly my point - the danger of "anthropomorphization" lies in taking the brain analogy too far. That is, there shouldn't necessarily be a link between research in neuroscience and advances that make deep learning models more accurate. The tasks are completely different (human learning vs. minimizing a loss function), and it's important for researchers in both fields - neuroscience and AI - to keep that in mind.
However, there definitely are analogies! E.g. early work in convnets was inspired by the architecture of cat brains.

I think the fields have useful things to say to each other, but we're getting over a (maybe justified) taboo in talking about machine learning methods being biologically inspired.

The origins of that analogy are very flimsy:

1) Hubel and Wiesel discover simple and complex cells in cat's V1 in the 60's. They came up with an ad hoc explanation that somehow the complex cells "pool" among many simple cells of the same orientation. No one to date knows how such pooling would be accomplished (that selects exactly simple cells of similar orientation and different phase, not vice versa), or whether that pooling is only on V1 or elsewhere in the cortex.

2) Fukushima expanded that ad hoc model into neocognitron in 80's, though there is exactly zero evidence for similar "pooling" in higher cortical areas. In fact, higher cortical areas are essentially impossible to disentangle and characterize even today.

3) Yann Lecun took neocognitron and made a convnet which worked OK for MNIST in the late 80's. Afterward the thing was forgotten for many years.

4) Some few years ago Hinton and some dude who could write good GPU code (Alex Krizhevsky), took the convent and won ImageNet. That is when the current wave of "AI" started.

In summary, covnets and very loosely based on an ad hoc explanation to Hubel and Wiesel findings in primary visual cortex, which today in neuroscience are regarded as "incomplete" to say the least (more likely completely wrong). Now this stuff works to a degree, but really all these biological inspirations are very minimal.

How do you know your brain's not minimizing a loss function?
For the analogy to hold, it's more of a question of whether or not ML algorithms operate in the same way as the brain. Right now, ML models use algorithms from continuous optimization that require certain structure. Namely, we require a Hilbert space, so that we can define things like derivatives and gradients. This puts certain requirements on the kinds of functions that we can minimize and the kinds of spaces that we can work with. These are requirements that are difficult to have precise analogies in biology. What does it mean to have an inner product in the brain? We does twice continuously differentiable mean in the context of a neuron? Even if there is a minimization principle, which I am not sure there is or is not, if ML uses algorithms, which are fundamentally not realizable in biology, how can we say it replicates the brain?
Based on what goes on in every cell in our bodies when it comes to the information processing involved with DNA, I don't think there is any such algorithm which is fundamentally not realizable in biology. I'll grant you, I don't think biological neurons are calculating derivatives across connection strengths, but there must be some analogous process to control neural connection strengths.
That may very well be and I think it's a fantastic area to do research on. Namely, can we accurately model the body with an algorithmic process and what does this process look like? However, unless ML directly mirrors: the algorithms involved in the body, the models used by the body, and the the misfit function used by the body, which together already assumes that the body really does operate on a strict minimization principle, then I contend it's improper to anthropomorphize the algorithms. They're good algorithms, but a better name would be empirical modeling since we're creating models from empirical data.
You might find a slide of my talk interesting:

https://ibb.co/fXAn4a

You have to read it from left to right with an twinking eye of course ;)

In your slide - why is back propogation a further stretch from a true bio-NN than an ANN without back propogation?
An ANN still resembles major features of an bio-NN.

1. A network

2. Flow of information is mainly unidirectional through a node

3. Multiple inputs, but one output, which is connected to the inputs of other neurons.

4. The connection strength between 2 neurons can be changed.

5. Non-linear behavior.

After all, I think, this is not such a bad first approximation. Hence the picture in the middle.

But I cannot believe that we learn by comparing thousands or millions of input and output patterns and back propagate the error through the network to perform a gradient descent at the neurons. That is simply not, what our brain does.

When there is feedback in neurons, what do you think that conveys?

I agree it is not some simple error correction like what is propagated backwards, but it happens often and I presume its something useful or it wouldn't be there.

Top down predictions are likely mediated by feedback connections from higher to lower areas. Functions include possibly encoding a generative prior for prediction, speeding up inference. They also play an important role in coding more informative error signals than simple derivatives and are part of how the brain learns even as it predicts.
This is only true because we don't know how the brain actually works. But the NN architecture is not unreasonable, it maps structures seen in the brain. Backpropagation is also reasonable to abstract the changes in gene and protein regulation (e.g. how learning could be encoded).
Well said. It's just curve fitting.
Maybe everything is "curve fitting." -- Note: I think it's more hierarchical than that but curve fitting is certainly one of the important capabilities of biological systems.
I don't think so. There's an incredibly important art and science to model selection that is not encapsulated in curve fitting. For example, say we observe a boy throwing a ball and we want to predict where the ball will land. From basic physics, we know the model is `y = 0.5 a t^2 + v0 t + y0` where `a` is the acceleration due to gravity, `v0` is the initial velocity, and `y0` is the initial height. After observing one or two thrown balls, even with error, we can estimate the parameters `a`, `v0`, and `y0` relatively well. Alternatively, we could apply a generic machine learning model to this problem. Eventually, it will work, but how much more data do we need? How many additional parameters do we need? Do the machine learning parameters have physical meaning like those in the original model? In this case, I contend the original model is superior.

Now, certainly, there are cases where we don't have a good or known model and machine learning is an extremely important tool for analyzing these cases. However, the process of making this determination and choosing what model to use is not solved by curve fitting or machine learning. This is a decision made by a person. Perhaps some day that will change, and that will be a major advance in intelligent systems, but we don't have that now and it's not clear to me how extending existing methods will lead us there.

Basically, I agree with the sentiment of the grandparent post. Machine learning is largely just curve fitting. How and when to apply a machine learning model vs another model is currently a decision left up to the user.

You're talking about the complexity of the model. If you take a purely input-output view of the world (which by the way, even classical Physics does), every problem _is_ curve fitting in a sufficiently high dimensional space. There is no _conceptual_ problem here. There is perhaps a complexity problem, but that's why I wrote that "I think it's more hierarchical than that."
I disagree. Many problem spaces are not continuous and can involve incomplete information that make a continuous model like a curve useless.

For instance, a linguistic model that lacks definitions for some words, or which allows too much ambiguity can leave sentences unparsable or uninterpretable. Disruptions to word order in sentences can lose sufficient information that no curve or fitment can recover it. A curve has to capture sufficient information for fitting it to be useful. I think not all concepts or relations are amenable to N-dimensional cartesian representation. (Though I'd like to see a reference confirming this.)

And hidden interdependence between dimensions can make any curve drawn in that coordinate space a misrepresentation of the actual info space, and any curve fit in it, dysfunctional.

Any mapping of info onto a cartesian coordinate space presumes constraints that limit the utility of any function that across that space. So no curve is guaranteed to be meaningful in "the real world" unless those assumptions are conserved upon reentry from the abstract world.

George Box's "All models are wrong, but some are useful" suggests that while fitting curves in wrong models may be possible, it well may be form without function.

>If you take a purely input-output view of the world (which by the way, even classical Physics does), every problem _is_ curve fitting in a sufficiently high dimensional space.

Not all spaces are Euclidean, and "purely input-output" still contains a lot of room for counterfactuals that ML models fail to capture.

What do you mean by counterfactuals? NNs are function approximation algorithms, in any geometry. No ifs ands or buts about it.
You seem to have replied on a tangent: how is what you describe not just "curve fitting"?

Humans didn't magic that model up: you're ignoring the huge amount of human effort over thousands of years that it took to arrive at that model. If we gave a ML algorithm a similar amount of time and asked it to construct a simple model of the situation, it might very well hand back the formula you presented.

Your entire post basically begs the question: it supposes that humans are doing something that isn't "curve fitting", and then uses that to argue that they do more.

What, specifically, are you supposing can't be done by "curve fitting"?

I believe the process for deriving fundamental physical models differs from the techniques used in ML. For example, say we want to use the principle of least action to derive an expression for energy similar to what Landau and Lifshitz derive in their book Mechanics. Here, we assume that the motion of a particle is defined by its position and velocity. We assume that the motion of the particle is defined by an optimization principle. We assume Galilean invariance. We assume that space and time are homogeneous is isotropic. Then, putting this all together we can derive an expression for energy that `E=0.5 m v^2`. At this point, we can validate our model with a series of experiments that curve fits this expression to the results.

Alternatively, we could just run a bunch of experiments on data using ML models. Eventually, someone may have a wonderful idea and realize that we can just reduce the ML model into a parabola. Of course, this is due to intuition and not the ML model. Nevertheless, even though we end up at the same result, I contend the first result is different. It has a huge amount of information embedded into it about the assumptions we made into how the world works. When those assumptions are no longer satisfied, we have a rubric for constructing a fix. For example, if Galilean invariance no longer holds, we can fix the above model using the same sort of derivations to obtain relativistic expressions. Again, we could just throw more data at this new problem and fit an ML model to and perhaps someone would stare at this new model and realize that `E = m c^2`. However, I think that's discounting the embedded information in deriving these models and I don't think this information is present in ML models. ML models are generic. Our most powerful physical models are not.

Now, sure, once we have the models, we're just going to fit them to the data and it's all just curve fitting. Other fields call this parameter estimation, parameter identification, or a variety of other names. At that point it's all curve fitting. However, again, I contend the process for determining a new model is not.

Of course. "What do I fit this curve to" is a prerequisite to "what is the shape of this curve?"

You shouldn't feel the need to defend theory-based modeling against some imagined incursion from arrogant deep learning researchers. NNs work tremendously well in a few specific problem domains that we had no way to approach otherwise. Elsewhere, they're not much better than any other prediction algorithm. By the way XGBoost is curve-fitting, too.

You haven't explained how the first case isn't "curve fitting": the agents performing the compilation of those facts into the new fact are just spitting out the "best" fit string of symbols based on learned rules, etc etc. Somethings computers can (theoretically) do, and which fits the description "curve fitting" just fine. School (and other education) is training the model they're using to do that compilation, but it's still just "curve fitting" based on reward/punishment signals.

What part of that can't an ML agent learn to do?

From my perspective, you're just describing the "higher order" layers of the network and pretending that humans aren't actually running those functions embedded on deep networks, then proclaiming that deep networks can't do it.

> Eventually, it will work, but how much more data do we need?

For a model that small, with so little variance (assume you measure correctly where the ball lands) it would be enough to do just a few throws to fit the parameters.

I hope Elon Musk understands that.
I am sure he does.
His public statements would indicate otherwise.
Consider that his public statements are made on the advice of his publicist, and that encouraging the AI hype is self-serving.
Or finding eigenvalues.
Is what you do not "just curve fitting"?
The anthropomorphization was done by academic researchers to gain/increase funding for themselves and the field. You can read the papers and see. This is commonly done for marketing purposes and is important since the pool of research money can be limited.
I always counter, the intelligence is not in the machine, but the builder. Antropomorphism is in line with that, because it projects the human qualities onto the machine, because, in a broad sense, they are modelled after those. Egoistic as we are, that's the only way to understand anything, to remove the shizm between animate and inanimate objects. Just like a fishing rod is just the extension of an arm.
> The model's intuition doesn't work like a human's

The model doesn't have intuition, it is just a series of computations.