Hacker News new | ask | show | jobs
by swyx 837 days ago
(not to toot own horn too much but i believe we were also the first big ai pod to feature rwkv: https://latent.space/p/rwkv )

Based presents the first real challenge to rwkv/mamba i've seen, both of which fall prey to the recall tradeoff referenced in TFA. i do have real questions on how the recall can grow unbounded with no tradeoff like that but then again i havent seriously studied the math.

2 comments

There's fundamentally a trade-off: https://arxiv.org/abs/2209.04881 . (Unless the Strong Exponential Time Hypothesis is false, but that's pretty unlikely, like P being equal to NP).
The Info extraction and Question Answering metrics are far worse than transformers though.

They also say that in the blog "However, both Based and Mamba still underperform the strongest Transformer baseline, sometimes by large margins. This is consistent with our “no free lunch” observation above"