|
|
|
|
|
by vicktorium
837 days ago
|
|
The RWVK modal was mention which is not based on transformers but on NN. [1] The context window is particularly interesting, i have interacted with the people over discord some time ago and the model seems good but not widely used yet. People are noticing the limitations will not shift to pure hardware -> energy now. the transformers allows heavy parallelization but it's too computationally-intensive even with quantitization. people are simply trying to run from the transformer is seem. [1] https://github.com/BlinkDL/RWKV-LM |
|
Based presents the first real challenge to rwkv/mamba i've seen, both of which fall prey to the recall tradeoff referenced in TFA. i do have real questions on how the recall can grow unbounded with no tradeoff like that but then again i havent seriously studied the math.