|
|
|
|
|
by dartos
812 days ago
|
|
I mean RWKV seems promising and isn’t a transformer model. Transformers have first mover advantage. They were the first models that scaled to large parameter counts. That doesn’t mean they’re the best or that they’ve won, just that they were the first to get big (literally and metaphorically) |
|