|
|
|
|
|
by gamegoblin
1072 days ago
|
|
I think the jury is still out if these will actually scale to ultra-long language understanding sequences. KWKV, for example, is still trained like GPT, but is architected so it can be run as an RNN during inference time. This is awesome, but it is unclear if the training regime will limit the effective use of long-ranging recurrent context. |
|