Hacker News new | ask | show | jobs
by debbiedowner 893 days ago
This has been my job for over 2 years now!

We do it on a symbol level after statically analyzing each change, and everything in the monorepo daily. Our remedy to high risk changes is to run more tests, client tests not unit tests. Sometimes there are 100k client tests to pick, so we rank them and run a small subset.

It is a hard problem. One interesting observation is that there is a culprit symbol or two in the culprit change, but its connectivity is very similar to non culprits in the same change.

Another observation is that the transitively modified callgraph after a change is pretty big, a depth of 50 is not unusual. It is hard to get many useful signals out of it beyond amount of overlap in transitively affected symbols between change and test.

We found file level and build target level to be too coarse, but AST symbols are working.

2 comments

Really interesting!! I wanted to implement this kind of system at Wikimedia but I quit my release engineering job at the beginning of 2022. Still think about this specific problem pretty often though. I never thought to use the score in order to determine how much testing needs to be done! That's actually really genius! If I had thought of that I probably could have pitched it and gotten more people behind the whole risk-scoring idea since overall testing times were getting really long on Wikimedia's codebase and targeted testing could have some real benefits in velocity of changes through the pipeline (with associated knock-on effects on developer productivity and job satisfaction).
100k client tests. Sounds like a lot. Is it integration tests or UI tests? How many tests overall have you got? I’m just curious
We add support by project, and the prototypical project we started with had 1M test reverse dependencies, a quarter of that was eligible test targets that we could recommend (based on language written in). This is probably the biggest project that we would ever find to support in the monorepo.

Some are UI tests, but we don't recommend those, because we found they don't catch breakages as often so we don't support the language they're written in. The tests we recommend are often integration type tests in that they call very higher order functions and often many of them.