Hacker News new | ask | show | jobs
by timbilt 556 days ago
> validates each test to ensure it runs successfully, passes, and increases code coverage

This seems to be based on the cover agent open source which implements Meta's TestGen-LLM paper. https://www.qodo.ai/blog/we-created-the-first-open-source-im...

After generating each test, it's automatically run — it needs to pass and increase coverage, otherwise it's discarded.

This means you're guaranteed to get working tests that aren't repetitions of existing tests. You just need to do a quick review to check that they aren't doing something strange and they're good to go.

1 comments

What the reasoning behind generating tests until they pass? Isn't the point of tests to discover erroneous corner cases?

What purpose does this serve besides the bragging rights of 'we need 90% coverage otherwise Sonarqube fails the build'?

Unit tests are more commonly written to future proof code from issues down the road, rather than to discover existing bugs. A code base with good test coverage is considered more maintainable — you can make changes without worrying that it will break something in an unexpected place.

I think automating test coverage would be really useful if you needed to refactor a legacy project — you want to be sure that as you change the code, the existing functionality is preserved. I could imagine running this to generate tests and get to good coverage before starting the refactor.

>Unit tests are more commonly written to future proof code from issues down the road, rather than to discover existing bugs. A code base with good test coverage is considered more maintainable — you can make changes without worrying that it will break something in an unexpected place.

The problem is a lot of unit tests could accurately be described as testing "that the code does what the code does." If the future changes to your code also require you to modify your tests (which they likely will) then your tests are largely useless. And if tests for parts of your code that you aren't changing start failing when you make code changes, that means you made terrible design decisions in the first place that led to your code being too tightly coupled (or had too many side effects, or something like global mutable state).

Integration tests are far, far more useful than unit tests. A good type system and avoiding the bad design patterns I mentioned handle 95% of what unit tests could conceivably be useful for.

I disagree in my experience, poorly designed tests test implementation rather than behavior. To test behavior you must know what is actually supposed to happen when the user presses a button.

One of the issues with getting high coverage is that often tests need to be written for testing implementation, rather than desired outcomes.

Why is this an issue? As you mentioned, testing is useful for future proofing codebases and making sure changing the code doesn't break existing use cases.

When test look for desired behavior, this usually means that unless the spec changes, all tests should pass.

The problem is when you test implementation - suppose you do a refactoring, cleanup, or extend the code to support future use cases - the test start failing. Clearly something must be changed in the tests - but what? Which cases encode actual important rules about how the code should behave, and which ones were just tautologically testing that the code did what it did?

This introduces murkiness and diminishes the value of tests.