Hacker News new | ask | show | jobs
by afro88 340 days ago
> Error rates compound exponentially in multi-step workflows. 95% reliability per step = 36% success over 20 steps. Production needs 99.9%+.

This misses a key feature of agents though. They get feedback from linters, build logs, test runs and even screenshots. And they collect this feedback themselves. This means they can error correct some mistakes along the way.

The math works out differently, depending on how well it can collect automated feedback it is doing what you want.

1 comments

Correct, I think it is better to see it as multiple stages. Investigation stage might spin off tasks to read files, perform searches online, summarise the request. Then ‘main stage’ where it performs changes. Afterwards indeed the testing+fixing stage where it verifies the results and potentially performs a couple fixes. These plans are predictable and the models learn which steps are relevant first particular projects.

For context, relevant information from steps can be cherrypicked to next stage.

The math works differently because AI (mostly) ignores irrelevant results. So steps actually increase reliability overall.