| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pk-protect-ai 916 days ago

Is there an original paper discussion? I seem to have missed it. It's quite interesting. I didn't catch on to this part:

"We note that full results on context length 8k are missing for the RWKV and RetNet baselines, prior strong recurrent models that can also be interpreted as SSMs, due to a lack of efficient implementation leading to out-of-memory or unrealistic computation requirements."

RetNet doesn't really consume much memory, and with the chunkwise forward implementation, it restricts the VRAM usage to the chunk size. This is the part to test the context length.

Has anyone done some tests on the original Mamba model? How fast is the training on this one in comparison with RetNet in parallel forward mode?

2 comments

error9348 916 days ago

https://news.ycombinator.com/item?id=38522428

https://openreview.net/forum?id=AL1fq05o7H

link

MacsHeadroom 916 days ago

Faster training, much faster inference, and about half the VRAM usage during inference.

link