|
|
|
|
|
by fulafel
578 days ago
|
|
Indeed it says "It enhances performance in areas where tasks are waiting for IO to complete by allowing the CPU to handle other tasks in the meantime". To restate my comment: I argued (1) this CPU cost would be very marginal compared to the LLM API compute cost, and (2) the CPU blocking claim doesn't really hold, due to the wonders of threads and processes. |
|
1. A blocking call, which can take 3-10 seconds.
2. A streaming call, which can also take 3-10 seconds but where the content is streaming directly to you as it is generated (and you may be proxying it through to your user).
In both cases you risk blocking a thread or process for several seconds. That's the problem asyncio solves for you - it means you could have hundreds (or even thousands) of users all waiting for the response to that LLM call without needing hundreds or thousands of blocked threads/processes.