| HN Mirror

This is actually a major issue in the LLM wrapper space. Building things like agents (which I think are insanely overhyped and I am so out on but won’t elaborate on), usually in Python, where you are making requests that might take 1-5 seconds to complete, with dependencies between responses, you basically need to have expert level async knowledge to build anything interesting. For example, say you want two agents talking to eachother and “thinking” independently in the same single threaded Python process. You need to write your code in such a way that one agent thinking (making a multi second call to an llm) does not block the other from thinking, but at the same time when the agents talk to each other they shouldn’t talk over eachother. Now imagine you have n number of these agents in the same program, say behind an async endpoint on a FastAPI server. It gets complicated quick.