Hacker News new | ask | show | jobs
by visarga 812 days ago
> Just because feedforward topologies with single neuron steps are the easiest to train and run on graphics cards does that really make them the actual best at accomplishing tasks?

You are ignoring a mountain of papers trying all conceivable approaches to create models. It is evolution by selection, in the end transformers won.

7 comments

Just because papers are getting published doesn't mean its actually gaining any traction. I mean we have known that time series of signals recieves plays a huge role in how bio neurons functionally operate and yet we have nearly no examples of spiking networks being pushed beyond basic academic exploration. We have known glial cells play a critical role in biological neural and yet you can probably count the number of papers that examine using an abstraction of that activity in neural net, on both your hands and toes. Neuroevolution using genetic algorithms has been basically looking for a big break since NEAT. Its the height of hubris to say that we have peaked with transformers when the entire field is based on not getting trapped in local maxima's. Sorry to be snippy, but there is so much uncovered ground its not even funny.
"We" are not forbidding you to open a computer, start experimenting and publishing some new method. If you're so convinced that "we" are stuck in a local maxima, you can do some of the work you are advocating instead of asking other to do it for you.
You can think chemotherapy is a local maxima for cancer treatment and hope medical research seeks out other options without having the resources to do it yourself. Not all of us have access to the tools and resources to start experimenting as casually as we wish we could.
Not a single one of you bigbrains used the word "maxima" correctly and it's driving me crazy.
As I understand it a local maxima means you’re at a local peak but there may be higher maximums elsewhere. As I read it, transformers are a local maximum in the sense of outperforming all other ML techniques as the AI technique that gets the closest to human intelligence.

Can you help my little brain understand the problem by elaborating?

Also you may want to chill with the personal attacks.

Not a personal attack. These posters are smarter than I am, just ribbing them about misusing the terminology.

"Maxima" is plural, "maximum" is singular. So you would say "a local maximum," or "several local maxima." Not "a local maxima" or, the one that really got me, "getting trapped in local maxima's."

As for the rest of it, carry on. Good discussion.

“Maxima” sounds fancy, making it catnip for people trying to sound smart.
yeah, not a Nissan in sight
MNIST and other small and easy to train against datasets are widely available. You can try out anything you like even with a cheap laptop these days thanks to a few decades of Moore's law.

It is definitely NOT out of your reach to try any ideas you have. Kaggle and other sites exist to make it easy.

Good luck! 8)

My pet project has been trying to use elixir with NEAT or HyperNEAT to try and make a spiking network, then when thats working decently drop some glial interactions I saw in a paper. It would be kinda bad at purely functional stuff, but idk seems fun. The biggest problems are time and having to do a lot of both the evolutionary stuff and the network stuff. But yeah the ubiquity of free datasets does make it easy to train.
Not to mention not everyone can be devoted to doing cancer research. Some Drs. and Nurses are necessary to you know actually treat the people who have cancer.
All we’re doing is engineering new data compression and retrieval techniques: https://arxiv.org/abs/2309.10668

Are we sure there’s anything “net new” to find within the same old x86 machines, within the same old axiomatic systems of the past?

Math is a few operations applied to carving up stuff and we believe we can do that infinitely in theory. So “all math that abides our axiomatic underpinnings” is valid regardless if we “prove it” or not.

Physical space we can exist in, a middle ground of reality we evolved just so to exist in, seems to be finite; I can’t just up and move to Titan or Mars. So our computers are coupled to the same constraints of observation and understanding as us.

What about daily life will be upended reconfirming decades old experiment? How is this not living in sunk cost fallacy?

When all you have is a hammer…

I’m reminded of Einstein’s quote about insanity.

Einstein didn't say that about insanity, but... systems exist and are consistently described by particular equations at particular scales. Sure we can say everything is quantum mechanics, even classical physics can technically be translated as a series of wave functions that explain the same behaviors we observe, if we could measure it... But it's impractical, and some of the concepts we think of as fundamental to certain scales, like nucleons, didn't exist at others, like equations that describe the energy of empty space. So, it's maybe not quite a fallacy to point out that not every concept we find to be useful, like deep learning inference, encapsulate every rule at every scale that we know about down to the electrons, cogently. Because none of our theories do that, and even if they did, we couldn't measure or process all the things needed to check and see if we're even right. So we use models that differ from each other, but that emerge from each other, but only when we cross certain scale thresholds.
If you abstract far enough then yes, everything what we are doing is somehow akin to what we have done before. But that then also applies to what Einstein has done.
Do you really think that transformers came to us from God? They're built on the corpses of millions of models that never went anywhere. I spent an entire year trying to scale up a stupid RNN back in 2014. Never went anywhere, because it didn't work. I am sure we are stuck in a local minima now - but it's able to solve problems that were previously impossible. So we will use it until we are impossibly stuck again. Currently, however, we have barely begun to scratch the surface of what's possible with these models.
(The singulars are ‘maximum’ and ‘minimum’, ‘maxima’ and ‘minima’ are the plurals.)
Who said that we peaked with transformers? I sure hope we did not. The current focus on them is just institutional inertia. Worst case another AI winter comes, at the end of which a newer, more promising technology would manage to attract funding anew.
His point is that "evolution by selection" also includes that transformers are easy to implement with modern linear algebra libraries and cheap to scale on current silicon, both of which are engineering details with no direct relationship to their innate efficacy at learning (though indirectly it means you scale up the training data for more inefficient learning).
I think it is correct to include practical implementation costs in the selection.

Theoretical efficacy doesn’t guarantee real world efficacy.

I accept that this is self reinforcing but I favor real gains today over potentially larger gains in a potentially achievable future.

I also think we are learning practical lessons on the periphery of any application of AI that will apply if a mold-breaking solution becomes compelling.

"won"

They barely work for a lot of cases (i.e., anything where accuracy matters, despite the bubble's wishful thinking). It's likely that something will sunset them in the next few years.

That is how evolution works. Something wins until something else comes along and win. And so on forever.
Evolution generally favors multiple winners in different roles over a single dominate strategy.

People tend to favor single winners.

I both think this is a really astute and important observation and also think it's an observation that's more true locally than of people broadly. Modern neoliberal business culture generally and the consolidated current incarnation of the tech industry in particular have strong "tunnel vision" and belief in chasing optimality compared to many other cultures, both extant and past
In neoclassical economics, there are no local maxima, because it would make the math intractable and expose how much of a load of bullshit most of it is.
Yep. This. It’s impressive how communication is instantaneous, unimpeded, complete and transparent in economics.

Those things aren’t even true in a 500 person company let alone an economy.

It seems cloyingly performative grumpy old man once you're at "it barely works and it's a bubble and blah blah" in response to a discussion about their comparative advantage (yeah, they won, and absolutely convincingly so)
That's like saying Bitcoin won cryptography.
I’d say it’s more that transformers are in the lead at the moment, for general applications. There’s no rigorous reason afaik that it should stay that way.
> in the end transformers won

we're at the end?

I mean RWKV seems promising and isn’t a transformer model.

Transformers have first mover advantage. They were the first models that scaled to large parameter counts.

That doesn’t mean they’re the best or that they’ve won, just that they were the first to get big (literally and metaphorically)

It doesn't seem promising, a one man band has been doing a quixotic quest based on intuition and it's gotten ~nowhere, and it's not for lack of interest in alternatives. There's never been a better time to have a different approach - is your metric "times I've seen it on HN with a convincing argument for it being promising?" -- I'm not embarrassed to admit that is/was mine, but alternatively, you're aware of recent breakthroughs I haven't seen.
RWKV has shown that you can scale RNNs to large parameter counts.

The fact that one person (initially) was able to do it highlights how much low hanging fruit there is for non transformers.

Also, the fact that a small number of people designed, trained, and published 5 versions of a perfectly serviceable (as in has decent summarizing ability. The biggest LLM use case) model which doesn’t have the time complexity of transformers is a big deal.

Yeah, I'd argue that transformers created such capital saturation that there's a ton of opportunity for alternative approaches to emerge.
Speak of the devil. Jamba just hit the front page.
“end”