Hacker News new | ask | show | jobs
by embedding-shape 144 days ago
You didn't actually just say "write tests" though right? What was the actual prompt you used?

I feel like that matters more than the tooling at this point.

I can't really understand letting LLMs decide what to test or not, they seem to completely miss the boat when it comes to testing. Half of them are useless because they duplicate what they test, and the other half doesn't test what they should be testing. So many shortcuts, and LLMs require A LOT of hand-holding when writing tests, more so than other code I'd wager.

2 comments

There are a lot of comments on HN and other places breathlessly gushing about agents totally doing everything end to end, so I couldn't blame someone new to this space for naively assuming that agents would be able to handle a well-bounded problem such as test coverage reasonably well.
> naively assuming that agents would be able to handle a well-bounded problem such as test coverage reasonably well.

We haven't figured out a way for humans to do that well :P I still see people arguing about "80% test coverage is obviously better than 70%" and similar dumb sentiments that completely misses the point.

But agree with the first part, LLMs are massively oversold and it's hard to blame users for believing them. Tempered expectations as always win.

No, that was an exaggeration. The prompt was decent. I explained the point of the repository, that I wanted full coverage with tests, that it could keep going until it worked. Maybe that was still not enough. With how others talk about it, I must be missing something.
For tests, you need to be precise about what it should test, how it should test it, and what the assertions should be, otherwise you'll mostly get trash, they're exceptionally horrible at writing tests. Which makes sense, most programmers are too, but given the importance of correct tests, it's probably the part that needs to most human handholding right now.