| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by agnokapathetic 921 days ago
	batch size 1 -- this is a terrible benchmark that really only shows memory bandwidth only -- LLM inference on Llama2-70b is memory bound up to a batch size of a half dozen or so.