| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by visarga 812 days ago
	> Just because feedforward topologies with single neuron steps are the easiest to train and run on graphics cards does that really make them the actual best at accomplishing tasks? You are ignoring a mountain of papers trying all conceivable approaches to create models. It is evolution by selection, in the end transformers won.

7 comments

retrofrost 812 days ago

Just because papers are getting published doesn't mean its actually gaining any traction. I mean we have known that time series of signals recieves plays a huge role in how bio neurons functionally operate and yet we have nearly no examples of spiking networks being pushed beyond basic academic exploration. We have known glial cells play a critical role in biological neural and yet you can probably count the number of papers that examine using an abstraction of that activity in neural net, on both your hands and toes. Neuroevolution using genetic algorithms has been basically looking for a big break since NEAT. Its the height of hubris to say that we have peaked with transformers when the entire field is based on not getting trapped in local maxima's. Sorry to be snippy, but there is so much uncovered ground its not even funny.

gwervc 812 days ago

"We" are not forbidding you to open a computer, start experimenting and publishing some new method. If you're so convinced that "we" are stuck in a local maxima, you can do some of the work you are advocating instead of asking other to do it for you.

Kerb_ 812 days ago

You can think chemotherapy is a local maxima for cancer treatment and hope medical research seeks out other options without having the resources to do it yourself. Not all of us have access to the tools and resources to start experimenting as casually as we wish we could.

erisinger 812 days ago

Not a single one of you bigbrains used the word "maxima" correctly and it's driving me crazy.

vlovich123 812 days ago

As I understand it a local maxima means you’re at a local peak but there may be higher maximums elsewhere. As I read it, transformers are a local maximum in the sense of outperforming all other ML techniques as the AI technique that gets the closest to human intelligence.

Can you help my little brain understand the problem by elaborating?

Also you may want to chill with the personal attacks.

erisinger 812 days ago

Not a personal attack. These posters are smarter than I am, just ribbing them about misusing the terminology.

"Maxima" is plural, "maximum" is singular. So you would say "a local maximum," or "several local maxima." Not "a local maxima" or, the one that really got me, "getting trapped in local maxima's."

As for the rest of it, carry on. Good discussion.

antonvs 812 days ago

“Maxima” sounds fancy, making it catnip for people trying to sound smart.

tschwimmer 812 days ago

yeah, not a Nissan in sight

mikewarot 812 days ago

MNIST and other small and easy to train against datasets are widely available. You can try out anything you like even with a cheap laptop these days thanks to a few decades of Moore's law.

It is definitely NOT out of your reach to try any ideas you have. Kaggle and other sites exist to make it easy.

Good luck! 8)

retrofrost 812 days ago

My pet project has been trying to use elixir with NEAT or HyperNEAT to try and make a spiking network, then when thats working decently drop some glial interactions I saw in a paper. It would be kinda bad at purely functional stuff, but idk seems fun. The biggest problems are time and having to do a lot of both the evolutionary stuff and the network stuff. But yeah the ubiquity of free datasets does make it easy to train.

importantbrian 809 days ago

Not to mention not everyone can be devoted to doing cancer research. Some Drs. and Nurses are necessary to you know actually treat the people who have cancer.

haltIncomplete 812 days ago

All we’re doing is engineering new data compression and retrieval techniques: https://arxiv.org/abs/2309.10668

Are we sure there’s anything “net new” to find within the same old x86 machines, within the same old axiomatic systems of the past?

Math is a few operations applied to carving up stuff and we believe we can do that infinitely in theory. So “all math that abides our axiomatic underpinnings” is valid regardless if we “prove it” or not.

Physical space we can exist in, a middle ground of reality we evolved just so to exist in, seems to be finite; I can’t just up and move to Titan or Mars. So our computers are coupled to the same constraints of observation and understanding as us.

What about daily life will be upended reconfirming decades old experiment? How is this not living in sunk cost fallacy?

When all you have is a hammer…

I’m reminded of Einstein’s quote about insanity.

aldousd666 812 days ago

Einstein didn't say that about insanity, but... systems exist and are consistently described by particular equations at particular scales. Sure we can say everything is quantum mechanics, even classical physics can technically be translated as a series of wave functions that explain the same behaviors we observe, if we could measure it... But it's impractical, and some of the concepts we think of as fundamental to certain scales, like nucleons, didn't exist at others, like equations that describe the energy of empty space. So, it's maybe not quite a fallacy to point out that not every concept we find to be useful, like deep learning inference, encapsulate every rule at every scale that we know about down to the electrons, cogently. Because none of our theories do that, and even if they did, we couldn't measure or process all the things needed to check and see if we're even right. So we use models that differ from each other, but that emerge from each other, but only when we cross certain scale thresholds.

samus 812 days ago

If you abstract far enough then yes, everything what we are doing is somehow akin to what we have done before. But that then also applies to what Einstein has done.

typon 812 days ago

Do you really think that transformers came to us from God? They're built on the corpses of millions of models that never went anywhere. I spent an entire year trying to scale up a stupid RNN back in 2014. Never went anywhere, because it didn't work. I am sure we are stuck in a local minima now - but it's able to solve problems that were previously impossible. So we will use it until we are impossibly stuck again. Currently, however, we have barely begun to scratch the surface of what's possible with these models.

leoc 812 days ago

(The singulars are ‘maximum’ and ‘minimum’, ‘maxima’ and ‘minima’ are the plurals.)

samus 812 days ago

Who said that we peaked with transformers? I sure hope we did not. The current focus on them is just institutional inertia. Worst case another AI winter comes, at the end of which a newer, more promising technology would manage to attract funding anew.

nicklecompte 812 days ago

His point is that "evolution by selection" also includes that transformers are easy to implement with modern linear algebra libraries and cheap to scale on current silicon, both of which are engineering details with no direct relationship to their innate efficacy at learning (though indirectly it means you scale up the training data for more inefficient learning).

wanderingbort 812 days ago

I think it is correct to include practical implementation costs in the selection.

Theoretical efficacy doesn’t guarantee real world efficacy.

I accept that this is self reinforcing but I favor real gains today over potentially larger gains in a potentially achievable future.

I also think we are learning practical lessons on the periphery of any application of AI that will apply if a mold-breaking solution becomes compelling.

foobiekr 812 days ago

"won"

They barely work for a lot of cases (i.e., anything where accuracy matters, despite the bubble's wishful thinking). It's likely that something will sunset them in the next few years.

victorbjorklund 812 days ago

That is how evolution works. Something wins until something else comes along and win. And so on forever.

Retric 812 days ago

Evolution generally favors multiple winners in different roles over a single dominate strategy.

People tend to favor single winners.

advael 812 days ago

I both think this is a really astute and important observation and also think it's an observation that's more true locally than of people broadly. Modern neoliberal business culture generally and the consolidated current incarnation of the tech industry in particular have strong "tunnel vision" and belief in chasing optimality compared to many other cultures, both extant and past

imtringued 812 days ago

In neoclassical economics, there are no local maxima, because it would make the math intractable and expose how much of a load of bullshit most of it is.

foobiekr 811 days ago

Yep. This. It’s impressive how communication is instantaneous, unimpeded, complete and transparent in economics.

Those things aren’t even true in a 500 person company let alone an economy.

refulgentis 812 days ago

It seems cloyingly performative grumpy old man once you're at "it barely works and it's a bubble and blah blah" in response to a discussion about their comparative advantage (yeah, they won, and absolutely convincingly so)

wizzwizz4 812 days ago

That's like saying Bitcoin won cryptography.

antonvs 812 days ago

I’d say it’s more that transformers are in the lead at the moment, for general applications. There’s no rigorous reason afaik that it should stay that way.

jjtheblunt 812 days ago

> in the end transformers won

we're at the end?

dartos 812 days ago

I mean RWKV seems promising and isn’t a transformer model.

Transformers have first mover advantage. They were the first models that scaled to large parameter counts.

That doesn’t mean they’re the best or that they’ve won, just that they were the first to get big (literally and metaphorically)

refulgentis 812 days ago

It doesn't seem promising, a one man band has been doing a quixotic quest based on intuition and it's gotten ~nowhere, and it's not for lack of interest in alternatives. There's never been a better time to have a different approach - is your metric "times I've seen it on HN with a convincing argument for it being promising?" -- I'm not embarrassed to admit that is/was mine, but alternatively, you're aware of recent breakthroughs I haven't seen.

dartos 811 days ago

RWKV has shown that you can scale RNNs to large parameter counts.

The fact that one person (initially) was able to do it highlights how much low hanging fruit there is for non transformers.

Also, the fact that a small number of people designed, trained, and published 5 versions of a perfectly serviceable (as in has decent summarizing ability. The biggest LLM use case) model which doesn’t have the time complexity of transformers is a big deal.

tkellogg 812 days ago

Yeah, I'd argue that transformers created such capital saturation that there's a ton of opportunity for alternative approaches to emerge.

dartos 812 days ago

Speak of the devil. Jamba just hit the front page.

szundi 812 days ago

“end”