Hacker News new | ask | show | jobs
by sarchertech 99 days ago
1. Agents aren’t humans. A human can write a working 100k LOC application with zero tests (not saying they should but they could and have). An agent cannot do this.

Agents require tests to keep them from spinning out and your tests do not cover all of the behaviors you care about.

2. If you doubt that your tests don’t cover all your requirements, 99.9% of every production bug you’ve ever had completely passed your test suite.

1 comments

I have never known a human that could or did write 100K lines of bug free working code without running parts of it first and testing.

So humans also don’t write bug free code or tests that cover all use cases - how is that an argument that humans are better?

Not that humans can't write 100k line programs bug free or without running parts of it.

An AI cannot write a 100k line program on its own without external guard rails otherwise it spins out. This has nothing to do with whether the agent is allowed to run the code itself. This is well documented. Look at what was required to allow Claude to write a "C compiler".

This has nothing to do with whether it's bug free. It literally can't produce a working 100k LOC program without external guardrails.

Absolutely no one is arguing that you shouldn’t have a combination of manual and automated tests around either AI or human generated code or that you shouldn’t have a thoughtful design
In a non-trivial app you can't test your way through all of the e2e workflows and thoughtful design isn't what I'm talking about.

How many bugs have you seen that passed your automated and manual testing? Probably 99.9% of them.

Now imagine that you take those same test suites and you unleash an agent on the code that has far worse reasoning capabilities than a human and you tell them they can change anything in the code as long as the tests pass.

So if bugs pass through testing which they have forever, wouldn’t that imply that humans are just as fallible as AI - and slower?

I never suggested letting agents code for a day on end. I use AI to code well defined tasks and treat it like a mid level ticket taker

If you have an employee who codes 2x faster than everyone else but produces 10x the bugs, would your suggestion to be to let him rip and stop reviewing his code output?

> I never suggested letting agents code for a day on end. I use AI to code well defined tasks and treat it like a mid level ticket taker

It doesn’t matter how long you’re letting it run. If you aren’t reviewing the output, you have no way of knowing when it changes untested behavior.

I regularly find Claude doing insane things that I never would have thought to test against, that would have made it into prod if I hadn’t renewed the code.