Hacker News new | ask | show | jobs
by osigurdson 11 days ago
Definitely agree that performance optimization is a good use case for LLMs. Here you have both a measurable goal / objective function and guardrails against functional regressions. It kind of closes the loop in that regard.

One thing however is a test suite is not usually exhaustive in the sense that any code that passes the tests is valid. Usually tests are more complimentary in nature. Therefore you could still possibly get code degradation, potentially.

1 comments

> One thing however is a test suite is not usually exhaustive in the sense that any code that passes the tests is valid. Usually tests are more complimentary in nature.

Not in the world of AI - if your tests don't catch any known issues, the problem is the tests aren't comprehensive enough. There's no excuse at this point not to have an incredibly comprehensive test suite, to go with your other agent feedback loop constraints

>> if your tests don't catch any known issues, the problem is the tests aren't comprehensive enough.

Maybe I misunderstand but this seems like a fairly low bar in the test suite only covers existing bugs.

I'd argue that if you aren't going to look at the code you actually need a fully comprehensive test suite - in the sense that if the tests pass, the code is correct and you don't have to look at it at all. The problem is, that isn't very quick to create it seems. Of course, if there is a way to do it quickly in a way that is reproducible by others I'd love to hear about it.

I don't mean just bugs, I mean any known issues. I test infra, I test UI, I test binary protocols, you name it. There is certainly no fast way to do it, even with AI (an AI generated suite is better than nothing but not as good), and it's a serious investment, but it's worth it. Testing becomes a process of correctness checking that snowballs over time, making everything else easier and better (or else the tests need further adjustment!)
Right. You mean all behaviors are tested, essentially.

So if you / team are going to implement a new feature, what does that look like? Do you write Gherkin or similar, unit tests or both? Can you provide an example of what that might look like? How much of this has changed for you since the pre-AI days?

These days, yes, integration test at the high level (usually a 1-to-3 liner), then unit tests as I go, often some mocked functional tests. This is basically the same but a ton faster in the AI days, you have to hold the AI accountable and demand quality and iterate, but this weekend I've built an entire test suite for a monorepo I just started working on. It's garbage quality but better than no tests, of course, and will improve as I work.

You can find some open source examples on github, either directly https://github.com/pgdogdev/pgdog/commits/main/?author=jagge... or through my profile - that repo has a pure-sql integration suite I wrote essentially entirely with AI: https://github.com/pgdogdev/pgdog/tree/main/integration/sql

There's also older work on github you can see over the years, a mishmash and grab bag, I would prefer if more of my work were open source but somehow most employers still default to closed source

Edit: While I'm thinking about it, the other thing you can do with AI is demand that it TDD things - I'm more of a "test all the fucking time" adherent, I don't care whether the tests are written first, but AI is perfectly happy to skate by making a tautological test unless you make it write the test first, ensure it fails correctly, make your change, and don't let it modify the test.