Hacker News new | ask | show | jobs
by simonw 295 days ago
I wonder if the author is using automated tests.

My hunch is that good automated testing is an enormous factor with respect to how productive you can get with coding agent tools.

Thorough tests? Just like working without LLMs you can confidently make changes without fear of breaking other parts of the application.

No tests at all? Any change you make is a roll of the dice with respect to how it affects the rest of your existing code.

2 comments

I’m reaching the same conclusion… I have been subscribing to LLMs for a couple of years, and trying to find the right balance and workflow that gets the best out of human and machine.

I now think TDD can play a big part. I don’t have much of a background in unit testing. For a recent TypeScript utility mini project, I took an outside-in approach using mocks where necessary. This started as a prototyping and modelling phase, getting the design right before committing to implementation code. This was about refining the types and function signatures, and mocking the components that didn’t exist at that point. The LLM didn’t have involvement at this stage, as it was about the problem domain, the shape and flow of the data. Moving on from there, I was able to save a lot of time because SuperMaven in Cursor had enough context and understanding at that point to make very precise guesses about what I wanted, so I could tab autocomplete through a reasonable amount of boilerplate implementation code. I was also able to get away with writing a couple of happy path tests for most components, and get the agentic LLM to generate sad path tests. Most of which I kept, including one that smoked out a flaw in my design.

That’s essentially the process I’m gravitating towards. Human begins the process, models the design, sets the constraints, and then the LLM saves time in a limited and supervised way whilst being kept on a short leash.

I don’t have much in the way of tests right now but I am building with Typescript and Rust so that catches many basic bugs.

I don’t find the issue to be breaking other parts of the app, more-so that new features don’t work as advertised by Claude.

One of my takeaways here is that I should give Claude an integration test harness and tell it that it must finish running that successfully before committing any code.