Hacker News new | ask | show | jobs
by vicktorium 837 days ago
The RWVK modal was mention which is not based on transformers but on NN. [1]

The context window is particularly interesting, i have interacted with the people over discord some time ago and the model seems good but not widely used yet.

People are noticing the limitations will not shift to pure hardware -> energy now.

the transformers allows heavy parallelization but it's too computationally-intensive even with quantitization.

people are simply trying to run from the transformer is seem.

[1] https://github.com/BlinkDL/RWKV-LM

1 comments

(not to toot own horn too much but i believe we were also the first big ai pod to feature rwkv: https://latent.space/p/rwkv )

Based presents the first real challenge to rwkv/mamba i've seen, both of which fall prey to the recall tradeoff referenced in TFA. i do have real questions on how the recall can grow unbounded with no tradeoff like that but then again i havent seriously studied the math.

There's fundamentally a trade-off: https://arxiv.org/abs/2209.04881 . (Unless the Strong Exponential Time Hypothesis is false, but that's pretty unlikely, like P being equal to NP).
The Info extraction and Question Answering metrics are far worse than transformers though.

They also say that in the blog "However, both Based and Mamba still underperform the strongest Transformer baseline, sometimes by large margins. This is consistent with our “no free lunch” observation above"