Hacker News new | ask | show | jobs
by UncleMeat 1177 days ago
This is the one that is most scary to me as a user of software. Have the AI write some of the code, sure. But tests have to be correct. Autogenerating a mountain of mostly-correct tests strikes me as a great way of ending up with surprising behavior that is a nightmare to untangle.
8 comments

> But tests have to be correct.

From my experience, ChatGPT-3 has been pretty good at exercising all meaningful branches (I look at code coverage results, I don't care for percentages at all), in the least amount of tests, on the first go. I definitely have to modify each test quite a bit, because it frequently hallucinates API calls that don't exist, but the code that it produces is an incredible blueprint. And I haven't even attempted to use RCI yet: "improve your answer" or "your answer is wrong because..."; ChatGPT-4 is supposedly extremely adept at reflecting on its responses. I can only imagine where this will be in a few years.

I was about 6 months late to Copilot because I was incredibly skeptic about it, without having used it in anger. My skepticism was mostly (but not entirely) wrong. Having actually used ChatGPT in anger, I find the degree of skepticism extremely skeptical.

It's like picking up C in the 1970s. We're at the very beginning when things are pretty rough, but the skills that I am building today are going to be foundational in the future. If you're dismissing AI without giving it a few weeks to earn its keep, it's going to be rough to catch up when things improve to the point where it is required.

If you write a unit test for each branch in your method and just put in the current behavior as the expectation, all you've done is created a test that says "the method does what it currently does."
> the method does what it currently does.

It isn't that stupid, and that was done at least a decade ago with static+control flow analysis. As for AI, I was recently writing tests for a VT push parser in Rust (which is novel code, so no parroting here) and it clearly knew enough about VT to write a, correctly, failing test. I had a bug in my parser, and the test that AI generated found it.

At the end of the day, I'm not sure why anyone would believe the critique of someone who hasn't used a tool in earnest.

That's not necessarily true. For instance snapshot testing, although there may be edge cases missed, the cases cannot be correct or incorrect.

Or you could have the AI write test cases (writing them out is often the most laborious part) and then validate them by hand. That'd be little different than writing them yourself, though again edge cases may be missed. You just skip the un-fun part of repetitively typing out the code for each case.

Tests are easy to verify - if they aren't they are probably bad tests. Using AI for stuff that's easy to verify has been good so far.

Knowing what tests to write is another story.

My problem with AI generated tests is that they lead to over testing and it bogs you down once you have to do refactoring. Ideally detailed tests should come once you're on like 3rd iteration and really sure you've nailed the design (oh I hate TDD if it isn't obvious). With AI I'm getting detailed tests and first try implementations, bad code locked in everywere :(

More tests != better code, tests are still code - the less code you have to satisfy some goal the better.

I see the assumption that automation means it removes the human from the task. The article is precisely about a human is still intermediating. Writing tests doesn’t mean blindly accepting the test, it mean the boilerplate of the test is auto completed based on a human description, then a human verifies the correctness of the output. Will people take the shortcut? Of course. But AI didn’t create or magnify that problem. How many “return true” or incomplete tests or missing significant cases etc have you reviewed or stumbled across? Or worse; the test free code base?
I've done a few rounds of "here is a spec. Write tests for it." ... "Here are some tests. Write an implementation which passes them."

That gets me checking the tests in the middle so I can fix them up if I need to.

It seems that the more useful way to use AI would be TDD, where you write the tests and perhaps a simple implementation, and AI makes the implementation good/fast and points out inconsistency in the tests.
Of course I validate the tests. I get it to churn them out then go over them to make sure everything is sane and covered as expected. I'm surprised that people would assume otherwise.
For me, the alternative is no tests at all.