| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rohansood15 23 days ago
	Are you comparing single-user requests or multiple concurrent requests when you say comparable to rented GPU? Most of the cost efficiencies kick in with concurrent/batch requests. A single H100 node can provide like 5k input + 2k output tok/s on a model like Qwen 3.6 35B-A3B with 30+ concurrent requests.