| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bob1029 665 days ago

> Transformers required ~2.5x more training steps to achieve comparable performance, overfitting eventually.

> RNNs are particularly suitable for sequence modelling settings such as those involving time series, natural language processing, and other sequential tasks where context from previous steps informs the current prediction.

I would like to draw an analogy to digital signal processing. If you think of the recurrent-style architectures as IIR filters and feedforward-only architectures as FIR filters, you will likely find many parallels.

The most obvious to me being that IIR filters typically require far fewer elements to produce the same response as an equivalent FIR filter. Granted, the FIR filter is often easier to implement/control/measure in practical terms (fixed-point arithmetic hardware == ML architectures that can run on GPUs).

I don't think we get to the exponential scary part of AI without some fundamentally recurrent architecture. I think things like LSTM are kind of an in-between hack in this DSP analogy - You could look at it as FIR with dynamic coefficients. Neuromorphic approaches seem like the best long term bet to me in terms of efficiency.

4 comments

lr1970 664 days ago

Again from signal processing: depending on position of the poles in z-transformed filter transfer function the output of IIR has a narrow stability region that is typically carefully designed for. Otherwise IIR filters either exponentially decay to zero to exponentially grow to infinity. RNN cells like LSTM are "decaying filters" with non-linear gates introduced to stop decay and to "remember" things.

FIR filters are way simpler to design and can capture memory without hacks.

wslh 665 days ago

ELI5: Could you explain what neuromorphic approaches mean, and how they contribute to AI/AGI? My first impression as a layperson (probably wrong) is that this approach resembles ideas from the book "The Society of the Mind", where the system isn't just simulating neurons but involves a variety of methods and interactions across "agents" or sub-systems.

bob1029 665 days ago

Neuromorphic mostly just means "like how the brain works". It encompasses a variety of software & hardware approaches.

The most compelling and obvious one to me is hardware purpose-built to simulate spiking neural networks. In the happy case, SNNs are extremely efficient. Basically consuming no energy. You could fool yourself into thinking we can just do this on the CPU due to the sparsity of activations. I think there is even a set of problems this works well for. But, in the unhappy cases SNNs are impossible to simulate on existing hardware. Neuronal avalanches follow power law distribution and meaningfully-large ones would require very clever techniques to simulate with any reasonable fidelity.

> the system isn't just simulating neurons but involves a variety of methods and interactions across "agents" or sub-systems.

I think the line between "neuron" and "agent" starts to get blurry in this arena.

seanhunter 665 days ago

We somehow want a network that is neuromorphic in structure but we don't want it to be like the brain and take 20 years or more to train?

Secondly how do we get to claim that a particular thing is neuromorphic when we have such a rudimentary understanding of how a biological brain works or how it generates things like a model of the world, understanding of self etc etc.

planetpluta 665 days ago

Something to consider is that it really could take 20+ years to train like a brain. But once you’ve trained it, you can replicate at ~0 cost, unlike a brain.

kybernetikos 665 days ago

> we don't want it to be like the brain and take 20 years or more to train?

Estimates put training of gpt4 at something like 2500 gpu years to train, over about 10000 gpus. 20 years would be a big improvement.

seanhunter 664 days ago

1 GPU year is in no way comparable to 1 chronological year of learning for a human brain though.

kybernetikos 664 days ago

Yes, but the underlying point is that in this case you can train the AI in parallel, and there's a decent chance this or something like it will be true for future AI architectures too. What does it matter that the AI needs to be trained on 20 years of experiences if all of those 20 years can be experienced in 6 months given the right hardware?

wslh 665 days ago

My take, for pragmatic reasons rather than how the brain actually works, is that an agent-based architecture is great because some tasks can be solved more effectively by specific algorithms or workflows rather than operating at the low level of neural networks (NN).

mafribe 665 days ago

Neuromorphic has been an ongoing failure (for general purpose processors or even AI accelerators), ever since Carver Mead introduced (and quickly abandoned them) them nearly half a century ago. Bill Dally (NVidia CTO) concurs: "I keep getting those calls from those people who claim they are doing neuromorphic computing and they claim there is something magical about it because it's the way that the brain works ... but it's truly more like building an airplane by putting feathers on it and flapping with the wings!" From: Hardware for Deep Learning, HotChips 2023 keynote.

We have NO idea how the brain produces intelligence, and as long as that doesn't change, "neuromorphic" is merely a marketing term, like Neurotypical, Neurodivergent, Neurodiverse, Neuroethics, Neuroeconomics, Neuromarketing, Neurolaw, Neurosecurity, Neurotheology, Neuro-Linguistic Programming: the "neuro-" prefix is suggesting a deep scientific insight to fool the audience. There is no hope of us cracking the question of how the human brain produces high-level intelligence in the next decade or so.

Neuromorphic does work for some special purpose applications.

chasd00 665 days ago

I like the feather analogy. Early on all humans knew about flight was from biology (watching birds fly) but trying to make a flying machine modeled after a bird would never work. We can fly today but plane designs are nothing like biological flying machines. In the same way, all we know about intelligence comes from biology and trying to invent an AGI modeled on biological intelligence may be just as impossible as a plane designed around how birds fly.

/way out of my area of expertise here

quotemstr 665 days ago

And it's only now, having built our own different kind of flying machine, that we understand the principles of avian flight well enough to build our own ornithopters. (We don't use ornithopters because they're not practical, but we've known how to build them since the 1960s.) We would have never gotten here had we just continued to try to blindly copy birds.

fennecfoxy 665 days ago

I love this book and have it sitting on my shelf right now! Read it when I was a kid and was amazed at the ideas in it, nowadays it's clearer to me that the author only had a grasp of how things like that would be built but still cool nonetheless.

I would highly recommend it to people who love a good "near future" scifi book.

bwanab 665 days ago

I'm sure you know this, but I think "the author" Marvin Minsky should be mentioned by name since he was one of the foundational theorists in the field of AI in general, but particularly in NNs.

x3haloed 665 days ago

> I don’t think we get to the exponential scary part of AI without some fundamentally recurrent architecture

I’ve been thinking the same for a while, but I’m starting to wonder if giant context windows are good enough to get us there. I think recurrency is more neuromorphic, and possibly important in the longer run, but maybe not required for SI.

I’m also just a layman with just a surface level understanding of these things, so I may be completely ignorant and wrong.

manjunaths 665 days ago

Can we even implement IIR filters to give good performance and scaling at large scale on current architectures like GPUs ?

bob1029 664 days ago

I don't think so. FIR filters can be unrolled and parallelized over the data. These are definitely possible to do on GPU to great effect. But, IIR filters constantly depend on the output of the prior time step, so you can't unroll anything. These would probably be faster to simulate on the CPU.

shaklee3 664 days ago

See my comment above. It's definitely doable and very fast.

shaklee3 664 days ago

Yes. See this paper: http://cs.txstate.edu/~mb92/papers/asplos18.pdf

And things have improved a lot since then.