| HN Mirror

Most LLM tasks can be expressed as a DAG, but the odds of it succeeding go way, way up if you drop the acyclic requirement (eg a “run tests, if they fail, fix it and loop back to running the tests” stage).

And it gets delegated to context because it’s either to have another session and tell it to double check and critique the first LLM than it is to write a deterministic test for every prompt. Like if I want a new form that sends a REST request on submit, I can have two LLMs duking it out in 5 minutes. If I have to write Selenium tests then I might as well just write the feature. Or I can have an LLM write the tests, but that’s more or less the same as letting a second LLM judge the first.