| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by brucethemoose2 1042 days ago
	For prompt ingestion... I dunno. Unbatched token generation is basically RAM bandwidth limited, as the entire model has to be cycled through for each token. I bet theoretical performance is similar to the GPU, albeit with much lower power consumption.