Y
Hacker News
new
|
ask
|
show
|
jobs
by
mike_hearn
226 days ago
But was that with batching? It makes a big difference. You can run many requests in parallel on the same card if you're doing LLM inferencing.