Hacker News new | ask | show | jobs
by madisonmay 1261 days ago
Interestingly it sounds like offloading could be made quite efficient in a batch setting if you primarily care about throughput rather than latency. Though I guess for most current LLM applications latency is quite important.