|
|
|
|
|
by Aurornis
137 days ago
|
|
That's not how this works. LLM serving at scale processes multiple requests in parallel for efficiency. Reduce the parallelism and you can process individual requests faster, but the overall number of tokens processed is lower. |
|