| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by korbip 769 days ago
	Thank you! I can say that it is not really a diminishing factor at the scales reported in the paper. So, xLSTM[7:1] is pretty much on par with xLSTM[1:0] in speed. We show that it is helpful on toy tasks, and it shows even better sequence extrapolation performance, so yes.