| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vicktorium 837 days ago

The RWVK modal was mention which is not based on transformers but on NN. [1]

The context window is particularly interesting, i have interacted with the people over discord some time ago and the model seems good but not widely used yet.

People are noticing the limitations will not shift to pure hardware -> energy now.

the transformers allows heavy parallelization but it's too computationally-intensive even with quantitization.

people are simply trying to run from the transformer is seem.

[1] https://github.com/BlinkDL/RWKV-LM

1 comments

swyx 837 days ago

(not to toot own horn too much but i believe we were also the first big ai pod to feature rwkv: https://latent.space/p/rwkv )

Based presents the first real challenge to rwkv/mamba i've seen, both of which fall prey to the recall tradeoff referenced in TFA. i do have real questions on how the recall can grow unbounded with no tradeoff like that but then again i havent seriously studied the math.

link

logicchains 837 days ago

There's fundamentally a trade-off: https://arxiv.org/abs/2209.04881 . (Unless the Strong Exponential Time Hypothesis is false, but that's pretty unlikely, like P being equal to NP).

link

kartoolOz 837 days ago

The Info extraction and Question Answering metrics are far worse than transformers though.

They also say that in the blog "However, both Based and Mamba still underperform the strongest Transformer baseline, sometimes by large margins. This is consistent with our “no free lunch” observation above"

link