| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by itamarcode 765 days ago

Hey, co-creator here, I agree with the sentiment that code coverage may be a proxy and even sometimes a vanity metric but at the same time, IMO unit regression tests are necessary for a maintainable production codebase. I personally don’t feel confident making changes to production code that isn’t tested.

Specifically for generating unit regression tests the Cover-Agent tool already works quite well in the wild for some projects, especially isolated projects (as opposed to complex enterprise-level code). You can see in the few (somewhat cherry-picked) examples we posted [0] that it generates working tests that increase coverage (they were cherry-picked in the sense that these are examples we like to work with often internally at CodiumAI).

I believe that it’s possible to generate additional meaningful tests including end-to-end tests by creating a more sophisticated flow that uses prompting techniques like reflection on the code and existing tests, and generates the tests iteratively, feeding errors and failures back to the LLM to let it fix them. Just as an example. This is somewhat similar to the approach we used with AlphaCodium [1] which hit 54% on the CodeContests benchmark (DeepMind’s AlphaCode 2 hit 43% [2] with the equivalent amount of LLM calls).

If like me you think tests are important but hate writing them, please consider contributing to the open source to help make it work better for more use cases. https://github.com/Codium-ai/cover-agent

[0] https://www.youtube.com/@Codium-AI/videos [1] https://github.com/Codium-ai/AlphaCodium [2] https://storage.googleapis.com/deepmind-media/AlphaCode2/Alp...