| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dartos 564 days ago
	I may be missing something, but I thought that each context token would result in an 3 additional parameters per context token for self attention to build its map, since each attention must calculate a value considering all existing context