|
|
|
|
|
by tptacek
383 days ago
|
|
No, what's happening here is we're talking past each other. An agent lints and compiles code. The LLM is stochastic and unreliable. The agent is ~200 lines of Python code that checks the exit code of the compiler and relays it back to the LLM. You can easily fool an LLM. You can't fool the compiler. I didn't say anything about whether code needs to be reviewed line-by-line by humans. I review LLM code line-by-line. Lots of code that compiles clean is nonetheless horrible. But none of it includes hallucinated API calls. Also, from where did this "you seem to have a fundamental belief" stuff come from? You had like 35 words to go on. |
|
The LLM can easily hallucinate code that will satisfy the agent and the compiler but will still fail the actual intent of the user.
> I review LLM code line-by-line. Lots of code that compiles clean is nonetheless horrible.
Indeed most code that LLMs generate compiles clean and is nevertheless horrible! I'm happy that you recognize this truth, but the fact that you review that LLM-generated code line-by-line makes you an extraordinary exception vs. the normal user, who generates LLM code and absolutely does not review it line-by-line.
> But none of [the LLM generated code] includes hallucinated API calls.
Hallucinated API calls are just one of many many possible kinds of hallucinated code that an LLM can generate, by no means does "hallucinated code" describe only "hallucinated API calls" -- !