| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by latchkey 596 days ago
	Exactly. Latency is less relevant if you have to have 4 literal servers (each taking up a whole rack) to push out one single 70B model and we don't know how many concurrent user requests that actually services (probably 1).