| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mathis 328 days ago
	This might be more pure, but there is nothing to be gained. On the contrary, this would lead to very long sequences for which self-attention scales poorly.