|
|
|
|
|
by LoganDark
1093 days ago
|
|
oh, you said trained. If trained, then the long context length issue may not be as severe. It might still go mad if you let it eat too much of a hundred-page lawsuit, but if you work with portions of it (like how transformers work), RWKV can be vastly more economical than the larger models (requiring a much less powerful GPU, or even running on no GPU at all, thanks to rwkv.cpp). rwkv.cpp in particular depends on a project that would not have existed in its current form without LLaMA, even though the project itself isn't LLaMA-specific. However there are enough other implementations of CPU inference (at least two?) that I think RWKV could still exist even if LLaMA had never. |
|