| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jensb1 1015 days ago
	Could someone explain how token generation speed relates to latency for the first token to be outputted? And if anyone have any metrics on latency on a 4090 for the 70B model, that would be very helpful.

2 comments

4090Ti? AFAIK that has never been released

typo

Unrelated. What matters for that is prompt processing time (which is in the high hundreds of tokens per second).