| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chc4 11 days ago
	it just me that thinks its kinda weird that they conflate speed in tokens/second and latency, when i think of latency as time to first token? like it generates an entire paragraph of tokens faster but wouldnt it still be slower if your reply is only 1 word because it has to do the entire 256 tokens as a chunk