|
|
|
|
|
by fulafel
568 days ago
|
|
(Disclaimer: I haven't used Django in a long time) Can you expand more on why these AI cases make the complexity tradeoff different? I'd imagine think waiting on a 3rd party LLM API call would be computationally very inexpensive compared to what's going on at the business end of that API call. Further lowering the cost, Django is usually configured to use multiple threads and/or processes so that this blocking call won't keep a CPU idle, no? |
|
They’re very slow. Like, several seconds to get a response slow. If you’re serving a very large number of very fast requests, you can argue that the simplicity of the sync model makes it worth it to just scale up the number of processes required to serve that many requests, but the LLM calls are slow enough that it means you need to dramatically scale up the number of serving processes available if you’re going to keep the sync model, and that’s mostly to have CPUs sitting around idle waiting for the LLM to come back. The async model can also let you parallelize calls to the LLMs if you’re making multiple independent calls within the same request - this can cut multiple seconds off your response time.