|
|
|
|
|
by dartos
811 days ago
|
|
RWKV has shown that you can scale RNNs to large parameter counts. The fact that one person (initially) was able to do it highlights how much low hanging fruit there is for non transformers. Also, the fact that a small number of people designed, trained, and published 5 versions of a perfectly serviceable (as in has decent summarizing ability. The biggest LLM use case) model which doesn’t have the time complexity of transformers is a big deal. |
|