| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dartos 812 days ago

I mean RWKV seems promising and isn’t a transformer model.

Transformers have first mover advantage. They were the first models that scaled to large parameter counts.

That doesn’t mean they’re the best or that they’ve won, just that they were the first to get big (literally and metaphorically)

2 comments

refulgentis 812 days ago

It doesn't seem promising, a one man band has been doing a quixotic quest based on intuition and it's gotten ~nowhere, and it's not for lack of interest in alternatives. There's never been a better time to have a different approach - is your metric "times I've seen it on HN with a convincing argument for it being promising?" -- I'm not embarrassed to admit that is/was mine, but alternatively, you're aware of recent breakthroughs I haven't seen.

link

dartos 811 days ago

RWKV has shown that you can scale RNNs to large parameter counts.

The fact that one person (initially) was able to do it highlights how much low hanging fruit there is for non transformers.

Also, the fact that a small number of people designed, trained, and published 5 versions of a perfectly serviceable (as in has decent summarizing ability. The biggest LLM use case) model which doesn’t have the time complexity of transformers is a big deal.

link

tkellogg 812 days ago

Yeah, I'd argue that transformers created such capital saturation that there's a ton of opportunity for alternative approaches to emerge.

link

dartos 812 days ago

Speak of the devil. Jamba just hit the front page.

link