Hacker News new | ask | show | jobs
by pumanoir 1206 days ago
I think is feasible. The description even says is designed to save on vram[1]. I don't get the other comments about needing more vram than a 3090.

Also, Neuralmagic may run their sparsification on ARM cpu's in the future, so keep an eye.

1. ChatRWKV v2: with "stream" and "split" strategies. 3G VRAM is enough to run RWKV 14B :)

1 comments

You have to split it up which slows it down a lot. The 14B model doesn't fit fully on a 3090, though the 7B fits easily and is very fast. Other replies either may have meant this or thought the original comment was about llama.