| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aik 920 days ago
	Do you have an example of where these methods still produce good summaries? Eg if you adjust how re-computation of self-attention in autoregressive decoding / between token generations works to significantly decrease the amount of computation needed?