| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by agentultra 39 days ago

The invariant, stated informally, would be hard to prove is broken by a human reviewer in the loop. Spoken language isn’t precise enough for the task.

Even if you could state it in a precise formal language the LLM under the agent doesn’t have the capability to understand what the invariant is for and why it’s important. You’ll still get oddly generated code. You might get an LLM that can associate certain tokens with those in the formal language specification which can hold invariants and perhaps even write the proofs… but you’ll still get a whole bunch of other code generated from the informal parts of the prompt.

I agree that simply adding constraints and prompts to you skills and specs isn’t going to prevent these things. Worse, that even if you could invent a better mouse trap the creature will still escape.

The problem is… “elongation:” the addition of code for the sake of the prompt/task/etc. Often less is better. This takes a human with the ability to anticipate what other humans would want/expect. When you need a generator, they’re great but it’s a firehouse that whose use should be restrained a little more.

1 comments

pron 39 days ago

> The invariant, stated informally, would be hard to prove is broken by a human reviewer in the loop. Spoken language isn’t precise enough for the task.

That depends on the invariant. Some are behavioural, like "variable x must be even if y is positive", but some are architectural, such as "a new view requires a new class".

But that's only one side of the problem because maintaining the invariant can be just as bad as breaking it. You ask the agent to add a feature and it may well maintain the invariant - only it shouldn't have, because the feature uncovers the fact that the invariant is architecturally wrong.

The problem is that evolving software requires exercising judgment about when you need to follow the existing strategy and when you need to rethink it. If there is any mechanical rule that could state what the right judgment is, I don't know what it is.

link

agentultra 39 days ago

Yes! I was trying to make this part of my point but you definitely made it much more clear and concise.

With a skilled operator, it could be possible to drive an agent to handle these kinds of changes. I would be concerned that spoken language wouldn't be precise enough to handle the refactoring and changes necessary to make to a code base when an invariant changes... regardless of whether it was a property, architectural, or procedural change. It already can take several prompts and burn quite a few tokens doing large-scale rewrites and code changes. Maybe the parameters and weights can be tuned for this kind of work but I remain skeptical that what we have at present is "efficient" at this kind of work.

link