Hacker News new | ask | show | jobs
by abeppu 352 days ago
I think we should shift the focus from adapting LLMs to our purposes (e.g. external tool use) and adapting how we think about software and focus on getting models that internally understand compilation and execution. Rather than merely building around next token prediction, the industry should take advantage of the fact that software in particular provides a cheap path to learning a domain-specific "world model".

Currently I sometimes get predictions where a variable that doesn't exist gets used or a method call doesn't match the signature. The text of the code might look pretty plausible but it's only relatively late that a tool invocation flags that something is wrong.

If instead of just code text, we trained a model on (code text,IR, bytecode) tuples, (byte code, fuzzer inputs, execution trace) examples, and (trace, natural language description) annotations. The model needs to understand not just what token sequences seem likely but (a) what will the code compile to? (b) what does the code _do_ and (c) how would a human describe this behavior? Bonus points for some path to tie in pre/post conditions, invariants, etc

"People need to adapt to weaker abstractions in the LLM era" is a short term coping strategy. Making models that can reason about abstractions in a much tighter loop and higher fidelity loop may get us code generation we can trust.

1 comments

These types of errors are not only rare in one-shots, but also very easy to fix in subsequent iterations - e.g. Claude Code with Sonnet rarely makes these errors.