|
|
|
|
|
by layer8
173 days ago
|
|
> We can just run the code and see if the output is what we expected There is a vast gap between the output happening to be what you expect and code being actually correct. That is, in a way, also the fundamental issue with LLMs: They are designed to produce “expected” output, not correct output. |
|
The output is correct but only for one input.
The output is correct for all inputs but only with the mocked dependency.
The output looks correct but the downstream processors expected something else.
The output is correct for all inputs with real world dependencies and is in the correct structure for downstream processors, but it's not being registered with the schema filtered and it all gets deleted in prod.
While implementing the correct function you fail to notice that the correct in every way output doesn't conform to that thing that Tom said because you didn't code it yourself but instead let the LLM do it. The system works flawlessly with itself but the final output fails regulatory compliance.