Hacker News new | ask | show | jobs
by mike_hearn 226 days ago
But was that with batching? It makes a big difference. You can run many requests in parallel on the same card if you're doing LLM inferencing.