| > If you define "hallucinations" to mean "any mistakes at all" then yes, a compiler won't catch them for you. That's not quite my definition. If we're judging these tools by the same criteria we use to judge human programmers, then mistakes and bugs should be acceptable. I'm fine with this to a certain extent, even though these tools are being marketed as having superhuman abilities. But the problem is that LLMs create an entirely unique class of issues that most humans don't. Using nonexistent APIs is just one symptom of it. Like I mentioned in the comment below, they might hallucinate requirements that were never specified, or fixes for bugs that don't exist, all the while producing code that compiles and runs without errors. But let's assume that we narrow down the definition of hallucination to usage of nonexistent APIs. Your proposed solution is to feed the error back to the LLM. Great, but can you guarantee that the proposed fix will also not contain hallucinations? As I also mentioned, in most occasions when I've done this the LLM simply produces more hallucinated code, and I get stuck in a neverending loop where the only solution is for me to dig into the code and fix the issue myself. So the LLM simply wastes my time in these cases. > The new Phoenix.new coding agent actively tests the web applications it is writing using a headless browser That's great, but can you trust that it will cover all real world usage scenarios, test edge cases and failure scenarios, and do so accurately? Tests are code as well, and it can have the same issues as application code. I'm sure that we can continue to make these tools more useful by working around these issues and using better adjacent tooling as mitigation. But the fundamental problem of hallucinations still needs to be solved. Mainly because it affects tasks other than code generation, where it's much more difficult to deal with. |
You do it in a loop. Keep looping and fixing until the code runs.
> but can you trust that it will cover all real world usage scenarios, test edge cases and failure scenarios, and do so accurately?
Absolutely not. Most of my blog entry about why code hallucinations aren't as dangerous as other mistakes talks about that as being the real problem humans need to solve when using LLMs to write code: https://simonwillison.net/2025/Mar/2/hallucinations-in-code/...
From the start of that article:
> The real risk from using LLMs for code is that they’ll make mistakes that aren’t instantly caught by the language compiler or interpreter. And these happen all the time!