| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by retrofrost 812 days ago
	This is amazing work, but to me it highlights some of the biggest problems in the current AI zeitgeist, we are not really trying to work on any neuron or ruleset that isnt much different from the perceptron thats just a sumnation function. Is it really that suprising that we just see this same structure repeated in the models. Just because feedforward topologies with single neuron steps are the easiest to train and run on graphics cards does that really make them the actual best at accomplishing tasks? We have all sorts of unique training methods and encoding schemes that don't ever get used because the big libraries don't support them. Until, we start seeing real varation in the fundamental rulesets of neuralnets we are always just going to be fighting against the fact these are just perceptrons with extra steps.

5 comments

visarga 812 days ago

> Just because feedforward topologies with single neuron steps are the easiest to train and run on graphics cards does that really make them the actual best at accomplishing tasks?

You are ignoring a mountain of papers trying all conceivable approaches to create models. It is evolution by selection, in the end transformers won.

link

retrofrost 812 days ago

Just because papers are getting published doesn't mean its actually gaining any traction. I mean we have known that time series of signals recieves plays a huge role in how bio neurons functionally operate and yet we have nearly no examples of spiking networks being pushed beyond basic academic exploration. We have known glial cells play a critical role in biological neural and yet you can probably count the number of papers that examine using an abstraction of that activity in neural net, on both your hands and toes. Neuroevolution using genetic algorithms has been basically looking for a big break since NEAT. Its the height of hubris to say that we have peaked with transformers when the entire field is based on not getting trapped in local maxima's. Sorry to be snippy, but there is so much uncovered ground its not even funny.

link

gwervc 812 days ago

"We" are not forbidding you to open a computer, start experimenting and publishing some new method. If you're so convinced that "we" are stuck in a local maxima, you can do some of the work you are advocating instead of asking other to do it for you.

link

Kerb_ 812 days ago

You can think chemotherapy is a local maxima for cancer treatment and hope medical research seeks out other options without having the resources to do it yourself. Not all of us have access to the tools and resources to start experimenting as casually as we wish we could.

link

erisinger 812 days ago

Not a single one of you bigbrains used the word "maxima" correctly and it's driving me crazy.

link

vlovich123 812 days ago

As I understand it a local maxima means you’re at a local peak but there may be higher maximums elsewhere. As I read it, transformers are a local maximum in the sense of outperforming all other ML techniques as the AI technique that gets the closest to human intelligence.

Can you help my little brain understand the problem by elaborating?

Also you may want to chill with the personal attacks.

link

antonvs 812 days ago

“Maxima” sounds fancy, making it catnip for people trying to sound smart.

link

tschwimmer 812 days ago

yeah, not a Nissan in sight

link

mikewarot 812 days ago

MNIST and other small and easy to train against datasets are widely available. You can try out anything you like even with a cheap laptop these days thanks to a few decades of Moore's law.

It is definitely NOT out of your reach to try any ideas you have. Kaggle and other sites exist to make it easy.

Good luck! 8)

link

retrofrost 812 days ago

My pet project has been trying to use elixir with NEAT or HyperNEAT to try and make a spiking network, then when thats working decently drop some glial interactions I saw in a paper. It would be kinda bad at purely functional stuff, but idk seems fun. The biggest problems are time and having to do a lot of both the evolutionary stuff and the network stuff. But yeah the ubiquity of free datasets does make it easy to train.

link

importantbrian 808 days ago

Not to mention not everyone can be devoted to doing cancer research. Some Drs. and Nurses are necessary to you know actually treat the people who have cancer.

link

haltIncomplete 812 days ago

All we’re doing is engineering new data compression and retrieval techniques: https://arxiv.org/abs/2309.10668

Are we sure there’s anything “net new” to find within the same old x86 machines, within the same old axiomatic systems of the past?

Math is a few operations applied to carving up stuff and we believe we can do that infinitely in theory. So “all math that abides our axiomatic underpinnings” is valid regardless if we “prove it” or not.

Physical space we can exist in, a middle ground of reality we evolved just so to exist in, seems to be finite; I can’t just up and move to Titan or Mars. So our computers are coupled to the same constraints of observation and understanding as us.

What about daily life will be upended reconfirming decades old experiment? How is this not living in sunk cost fallacy?

When all you have is a hammer…

I’m reminded of Einstein’s quote about insanity.

link

aldousd666 812 days ago

Einstein didn't say that about insanity, but... systems exist and are consistently described by particular equations at particular scales. Sure we can say everything is quantum mechanics, even classical physics can technically be translated as a series of wave functions that explain the same behaviors we observe, if we could measure it... But it's impractical, and some of the concepts we think of as fundamental to certain scales, like nucleons, didn't exist at others, like equations that describe the energy of empty space. So, it's maybe not quite a fallacy to point out that not every concept we find to be useful, like deep learning inference, encapsulate every rule at every scale that we know about down to the electrons, cogently. Because none of our theories do that, and even if they did, we couldn't measure or process all the things needed to check and see if we're even right. So we use models that differ from each other, but that emerge from each other, but only when we cross certain scale thresholds.

link

samus 812 days ago

If you abstract far enough then yes, everything what we are doing is somehow akin to what we have done before. But that then also applies to what Einstein has done.

link

typon 812 days ago

Do you really think that transformers came to us from God? They're built on the corpses of millions of models that never went anywhere. I spent an entire year trying to scale up a stupid RNN back in 2014. Never went anywhere, because it didn't work. I am sure we are stuck in a local minima now - but it's able to solve problems that were previously impossible. So we will use it until we are impossibly stuck again. Currently, however, we have barely begun to scratch the surface of what's possible with these models.

link

leoc 812 days ago

(The singulars are ‘maximum’ and ‘minimum’, ‘maxima’ and ‘minima’ are the plurals.)

link

samus 812 days ago

Who said that we peaked with transformers? I sure hope we did not. The current focus on them is just institutional inertia. Worst case another AI winter comes, at the end of which a newer, more promising technology would manage to attract funding anew.

link

nicklecompte 812 days ago

His point is that "evolution by selection" also includes that transformers are easy to implement with modern linear algebra libraries and cheap to scale on current silicon, both of which are engineering details with no direct relationship to their innate efficacy at learning (though indirectly it means you scale up the training data for more inefficient learning).

link

wanderingbort 812 days ago

I think it is correct to include practical implementation costs in the selection.

Theoretical efficacy doesn’t guarantee real world efficacy.

I accept that this is self reinforcing but I favor real gains today over potentially larger gains in a potentially achievable future.

I also think we are learning practical lessons on the periphery of any application of AI that will apply if a mold-breaking solution becomes compelling.

link

foobiekr 812 days ago

"won"

They barely work for a lot of cases (i.e., anything where accuracy matters, despite the bubble's wishful thinking). It's likely that something will sunset them in the next few years.

link

victorbjorklund 812 days ago

That is how evolution works. Something wins until something else comes along and win. And so on forever.

link

Retric 812 days ago

Evolution generally favors multiple winners in different roles over a single dominate strategy.

People tend to favor single winners.

link

advael 812 days ago

I both think this is a really astute and important observation and also think it's an observation that's more true locally than of people broadly. Modern neoliberal business culture generally and the consolidated current incarnation of the tech industry in particular have strong "tunnel vision" and belief in chasing optimality compared to many other cultures, both extant and past

link

imtringued 812 days ago

In neoclassical economics, there are no local maxima, because it would make the math intractable and expose how much of a load of bullshit most of it is.

link

refulgentis 812 days ago

It seems cloyingly performative grumpy old man once you're at "it barely works and it's a bubble and blah blah" in response to a discussion about their comparative advantage (yeah, they won, and absolutely convincingly so)

link

wizzwizz4 812 days ago

That's like saying Bitcoin won cryptography.

link

antonvs 812 days ago

I’d say it’s more that transformers are in the lead at the moment, for general applications. There’s no rigorous reason afaik that it should stay that way.

link

jjtheblunt 812 days ago

> in the end transformers won

we're at the end?

link

dartos 812 days ago

I mean RWKV seems promising and isn’t a transformer model.

Transformers have first mover advantage. They were the first models that scaled to large parameter counts.

That doesn’t mean they’re the best or that they’ve won, just that they were the first to get big (literally and metaphorically)

link

refulgentis 812 days ago

It doesn't seem promising, a one man band has been doing a quixotic quest based on intuition and it's gotten ~nowhere, and it's not for lack of interest in alternatives. There's never been a better time to have a different approach - is your metric "times I've seen it on HN with a convincing argument for it being promising?" -- I'm not embarrassed to admit that is/was mine, but alternatively, you're aware of recent breakthroughs I haven't seen.

link

dartos 811 days ago

RWKV has shown that you can scale RNNs to large parameter counts.

The fact that one person (initially) was able to do it highlights how much low hanging fruit there is for non transformers.

Also, the fact that a small number of people designed, trained, and published 5 versions of a perfectly serviceable (as in has decent summarizing ability. The biggest LLM use case) model which doesn’t have the time complexity of transformers is a big deal.

link

tkellogg 812 days ago

Yeah, I'd argue that transformers created such capital saturation that there's a ton of opportunity for alternative approaches to emerge.

link

dartos 812 days ago

Speak of the devil. Jamba just hit the front page.

“end”

> the perceptron thats just a sumnation[sic] function

What would you suggest?

My understanding of part of the whole NP-Complete thing is that any algorithm in the complexity class can be reduced to, among other things, a 'summation function'.

link

ldjkfkdsjnv 812 days ago

Cannot understand people claiming we are in a local maxima, when we literally had an ai scientific breakthrough only in the last two years.

link

xanderlewis 812 days ago

Which breakthrough in the last two years are you referring to?

link

6gvONxR4sf7o 812 days ago

If you had to reduce it to one thing, it's probably that language models are capable few shot and zero shot learners. In other words, training a model to simply predict the next word on naturally occurring text, you end up with an tool you can use for generic tasks, roughly speaking.

link

xyzzy_plugh 812 days ago

It turns out a lot of tasks are predictable. Go figure.

link

ldjkfkdsjnv 812 days ago

the LLM scaling law

link

posix86 812 days ago

I don't understand enough about the subject to say, but to me it seemed like yes, other models have better metrics with equal model size i.t.o. number of neurons or asymptotic runtime, but the most important metric will always be accuracy/precision/etc for money spent... or in other words, if GPT requires 10x number of neurons to reach the same performance, but buying compute & memory for these neuros is cheaper, then GPT is a better means to an end.

link

blueboo 812 days ago

The bitter lesson, my dude. http://www.incompleteideas.net/IncIdeas/BitterLesson.html

If you find a simpler, trainable structure you might be onto something

Attempts to get fancy tried and died

link