| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cs702 621 days ago

I finally got around to reading this. Nice paper, but it fails to address a key question about RNNs:

Can RNNs be as good as Transformers at recalling information from previous tokens in a sequence?

Transformers excel at recalling info, likely because they keep all previous context around in an ever-growing KV cache.

Unless proponents of RNNs conclusively demonstrate that RNNs can recall info from previous context at least as well as Transformers, I'll stick with the latter.