| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mike_hearn 273 days ago
	But was that with batching? It makes a big difference. You can run many requests in parallel on the same card if you're doing LLM inferencing.