|
|
|
|
|
by LoganDark
1097 days ago
|
|
Probably not, honestly—because it's an RNN, old information gradually deteriorates as new information is fed into the model, which is undesirable compared to e.g. transformers that can reference any part of the context without degradation, but have a hard limit on context size (RWKV can ingest a theoretically infinite number of tokens, but after around 16k it will start to degrade into madness until restarted, so practically it does sort of have a limit). (The reason why it degrades is because a single internal state is updated in-place per token, and the currently models have only been trained with up to 8192 tokens of context, so once you start getting double past that or so, the state starts to diverge from "sanity", with no known way to correct this. And then priming a new instance of the model with 8192 tokens or so of the new context takes a really long time because you can't compute the next token of an RNN until you also have the previous one!) With some fine-tuning (which, even that is ... still out of reach for most people unfortunately, but I digress) it can be turned into a pretty good chat model, generate story completions, generate boilerplate code etc. and the base model is reasonably okay at most of these things already. I think it's definitely a competitor in some areas, though I don't remember if there have already been benchmarks putting it up against the other models. I do know that it's better than the majority of other open-source models, including transformer-based ones, but this is probably more the fault of training data than architecture. |
|