|
|
|
|
|
by cs702
621 days ago
|
|
I finally got around to reading this. Nice paper, but it fails to address a key question about RNNs: Can RNNs be as good as Transformers at recalling information from previous tokens in a sequence? Transformers excel at recalling info, likely because they keep all previous context around in an ever-growing KV cache. Unless proponents of RNNs conclusively demonstrate that RNNs can recall info from previous context at least as well as Transformers, I'll stick with the latter. |
|