Hacker News new | ask | show | jobs
by refulgentis 814 days ago
It doesn't seem promising, a one man band has been doing a quixotic quest based on intuition and it's gotten ~nowhere, and it's not for lack of interest in alternatives. There's never been a better time to have a different approach - is your metric "times I've seen it on HN with a convincing argument for it being promising?" -- I'm not embarrassed to admit that is/was mine, but alternatively, you're aware of recent breakthroughs I haven't seen.
1 comments

RWKV has shown that you can scale RNNs to large parameter counts.

The fact that one person (initially) was able to do it highlights how much low hanging fruit there is for non transformers.

Also, the fact that a small number of people designed, trained, and published 5 versions of a perfectly serviceable (as in has decent summarizing ability. The biggest LLM use case) model which doesn’t have the time complexity of transformers is a big deal.