|
|
|
|
|
by madisonmay
1261 days ago
|
|
Interestingly it sounds like offloading could be made quite efficient in a batch setting if you primarily care about throughput rather than latency. Though I guess for most current LLM applications latency is quite important. |
|