Hacker News new | ask | show | jobs
by krilcebre 93 days ago
You are comparing compilers to a completely non deterministic code generation tool that often does not take observable behavior into account at all and will happily screw a part of your system without you noticing, because you misworded a single prompt.

No amount of unit/integration tests cover every single use case in sufficiently complex software, so you cannot rely on that alone.

1 comments

I just rewrote a utility for the third time - the first two were before AI.

Short version, when someone designs a call center with Amazon Connect, they use a GUI flowchart tool and create “contact flows”. You can export the flow to JSON. But it isn’t portable to other environments without some remapping. I created a tool before that used the API to export it and create a portable CloudFormation template.

I always miss some nuance that can half be caught by calling the official CloudFormation linter and the other half by actually deploying it and seeing what errors you get

This time, I did with Claude code, ironically enough, it knew some of the complexity because it had been trained on one of my older open source implementations I did while at AWS. But I told it to read the official CloudFormation spec, after every change test it with the linter, try to deploy it and fix it.

Again, I didn’t care about the code - I cared about results. The output of the script either passes the deployment or it doesn’t. Claude iterated until it got it right based on “observable behavior”. Claude has tested whether my deployments were working as expected plenty of times by calling the appropriate AWS CLI command and fixed things or reading from a dev database based on integration tests I defined.