Hacker News new | ask | show | jobs
Ask HN: How do you separate intentional test boilerplate from real duplication?
7 points by rafaepta 2 days ago
I am maintaining an open-source project (deterministic open source duplicate-code detector) and a user asked for a feature which I don’t have a clear answer on how to implement.

This seems a very hard problem to solve:

-Tests repeat the same scenario. For a structural detector, this flags as repetition (duplication). However, tests are not something people want to delete from the codebases.

-The repetitions from tests (on purpose) end up looking like undesired code duplication and the tools canno tell which is which.

-One way to solve this would be something like a human in the loop (kind of how linters allow user to accept something once, while keeping the default first run zero-config).

Wonder how you have seen this handle and if anyone have any ideas.

Here is the the repo: https://github.com/Rafaelpta/dupehound

And here is the issue with more detail: https://github.com/Rafaelpta/dupehound/issues/23

4 comments

I’ve dealt with a question that rhymes with this.

Sonarqube or CodeQL reports might tell me what percentage of a repo is duplicated code, and a large percentage of that is in src/test/java

I find that a lot of the time this is not just some flippant observation but a clue that I should be using a mechanism like @ParameterizedTest instead of @Test, so I rewrite those tests in a way that makes them easier to set-up, define parameters/constraints, inputs, and outputs. Sometimes it does get a little convoluted as you either use a lot of naked Arguments.of() or define test-class-scoped nested records to encapsulate test parameters, inputs, expected outputs, etc.

Detect tests somehow (eg. in rust you could check for #[test]) and just skip the analysis for that function?
Maybe I don’t quite understand the question but can you not just define a function that sets up the shared test state and use that in every test?
What is a "structural detector"?
clicking through to the repo linked at the end it appears to be rolling-hash-style ast structural pattern matching that ignores things like what names identifiers concretely have