| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by logicchains 665 days ago
	The model in the paper isn't a "real" RNN due making it parallelizable, for same the reasons described in https://arxiv.org/abs/2404.08819 , and hence is theoretically less powerful than a "real" RNN (struggles at some classes of problems that RNNs traditionally excel at). On the other hand, https://arxiv.org/abs/2405.04517 contains a "real" RNN component, which demonstrates a significant improvement on the kind of state-tracking problems that transformers struggle with.

1 comments

robertsdionne 665 days ago

These are real RNNs, they still depend upon the prior hidden state, it’s just that the gating does not. The basic RNN equation can be parallelized with parallel prefix scan algorithms.

link