|
|
|
|
|
by tptacek
490 days ago
|
|
Hold on, hold on. You're missing a step here. I agree completely that an LLM's first attempt to write a Semgrep rule is likely as not to be horseshit. That's true of everything an LLM generates. But I'm talking about closed-loop LLM code generation. Unlike legal arguments and medical diagnoses, you can hook an LLM up to an execution environment and let it see what happens when the code it generates runs. It then iterates, until it has something that works. Which, when you think about it, is how a lot of human-generated code gets written too. So my thesis here does not depend on LLMs getting things right the first time, or without assistance. |
|
One has to know, and understand, what the code is supposed to be doing, to evaluate it. Or use tests.
But LLMs love to lie so they can't be trusted to write the tests, or even to report how the code they wrote passed the tests.
In my experience the way to use LLMs for coding is exactly the opposite: the user should already have very good knowledge of the problem domain as well as the language used, and just needs to have a conversation with someone on how to approach a specific implementation detail (or help with an obscure syntax quirk). Then LLMs can be very useful.
But having them directly output code for things one doesn't know, in a language one doesn't know either, hoping they will magically solve the problem by iterating in "closed loops", will result in chaos.