| HN Mirror

its those kinds of examples that make it hard to cleave a measurement of success.

Based on those kinds of results an LLM should, in theory, be able to plan, analyze and suggest improvements, without the need for human intervention.

You will see rudimentary success for this as well - however, when you push the tool further, it will stop being... "logical".

I'd refine the point to saying that you will get some low hanging fruit in terms of syntactic prediction and semantic analysis.

But when you lean ON semantic ability, the model is no longer leaning on its syntactic data set, and it fails to generalize.