| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Translationaut 900 days ago
	This seems only to work cause large GPTs have redundant, undercomplex attentions. See this issue in BertViz about attention in Llama: https://github.com/jessevig/bertviz/issues/128