| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nullc 615 days ago
	They're memory bandwidth limited, you can basically just estimate the performance from the time it takes to read the entire model from ram for each token.