| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by imjonse 665 days ago
	To their credit, the authors (Y. Bengio among them) end the paper with the question, not suggesting they know the answer. These models are very small even by academic standards so any finding would not necessarily extend to current LLM scales. The main conclusion is that RNN class networks can be trained as efficiently as modern alternatives but the resulting performance is only competitive at small scale.

1 comments

phkahler 665 days ago

>> These models are very small even by academic standards so any finding would not necessarily extend to current LLM scales.

Emphasis on not necessarily.

>> The main conclusion is that RNN class networks can be trained as efficiently as modern alternatives but the resulting performance is only competitive at small scale.

Shouldn't the conclusion be "the resulting competitive performance has only been confirmed at small scale"?

link

imjonse 665 days ago

yes, that is clearer indeed. However S4 and Mamba class models have also performed well at small scale and started lagging with larger models and larger context sizes, or at particular tasks.

link