Hacker News new | ask | show | jobs
by IIAOPSW 1326 days ago
I'll be bold enough to make the contrarian prediction. The approach of throwing ever more parameters in the model and ever more transistors on the chip is at best a brute force approach to AI and will likely plateau in effectiveness long before we get to "general purpose AI". We do not need 1nm neurons running at GHZ rates and training on a corpus of everything ever said just to comprehend language. There needs to be an algorithmic breakthrough. There is likely already more than enough processing power.

Even bolder prediction: When we finally understand how the brain actually does it, the algorithmic improvement will be so enormous that the machine learning tasks which run on massive servers today will be able to run on the phone currently in your pocket.

4 comments

This view may ultimately be right, but massively ignores the current observed trends in capabilities increase[0], scaling laws[1], and things like grokking[2]. I'm seeing an increasing amount of researchers (me included) moving to stances like: "there is a scary possibility that we may solve all the benchmarks we come up for AI... without understanding anything fundamentally deep about what intelligence is about. a bummer for those like me who are see AI as a fantastic way to unlock deeper insights on human intelligence" @Thom_Wolf [3]

[0] https://www.lesswrong.com/posts/K4urTDkBbtNuLivJx/why-i-thin...

[1] https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla...

[2] https://twitter.com/_akhaliq/status/1479265403142553601

[3] https://twitter.com/TacoCohen/status/1584499066410790912

I feel the same way. What if there’s a better way to use those transistors?

The semiconductor researchers spend a lot of effort to make ever-smaller transistors. What is a transistor? It’s a tiny switch.

The ML researchers meanwhile use the language of linear algebra to define mathematical transformations of real numbers with nice differentiability properties.

The chipmakers are then tasked with reconciling the two. So they use transistors to make gates. And gates to make adders. And adders to make integer multipliers. And integer multipliers to make floating point multipliers. And fp multipliers to implement matrix multiplication. And now you can run your cat diffuser model on those transistors.

But what is the chance that the configuration of transistors in a floating point multiplier is anywhere close to the most efficient transistor configuration for learning?

The only reason we’re using multiplication of real numbers is because the math people said so.

Since we are openly speculating, I think the missing ingredient is feedback loops. There is no explicit input side and output side of the brain. Its all just a ball of neurons. There is propagation delay between the neurons. This makes it possible to have self sustaining loops of neurons firing. The longer the loop, the longer the amount of time it takes to go full circle. We call this phenomena "brain waves".

I think what we get wrong is that individual neurons rarely represent anything. They are a medium for the waves. The waves are the currency of thought. A brain is a series of electro-mechanical oscillators that resonates with abstract concepts and patterns.

AFAIK, most research is still using the old "neurons represent single things" paradigm. Someone needs to tell them, there's no such thing as a "grandmother neuron".

> When we finally understand how the brain actually does it

If you dig into how neurons work in the brain you'll discover that a single neuron has the complexity of a large neural network internally and it's behavior is not nearly as simple as the typical model explanation. Different ion channels, time-dependent behavior, up/down regulation of neurotransmitter receptors and release, and much more.

It is entirely possible that the brain "does it" by throwing vastly more computing resources at the problem than we previously believed.

I have and while its true a neuron does more than the simple on/off of their artificial peers, I'd hardly call it "the complexity of a large neural network internally". There just aren't that many bits needed to represent all the parameters you just mentioned. Stuff you mention like ion channels and neurotransmitters feels like excessively mimicking biological constraints rather than something actually relevant. Who cares if the real neurons use chemical channels and electrical channels to communicate, the artificial ones can just send the same information in electrical channels for everything.
While brain has vastly more computing resources, they are probably operating very far from the theoretical optimum. There must be so much baggage, inefficiency and dead pathways left which do not have much purpose at all. The brain was evolved, not designed, which means that any random features/mutations which did not inhibit individuals ability to procreate got to stay.

I think that even when we understand the brain completely, it will be very difficult disentangling what is useful for artificial neural networks and what doesn't really matter.

The capabilities are already there. If the compute becomes more affordable, they will explode in usage. In fact, this is already happening. See live transcription on newer iOS devices for an example.

Scaling diagrams also showed no sign of plateauing AFAIK.