| This is yet another paper where the title exaggerates the importance of the conclusion. I found a copy of the original paper - there are several risks to the model that the authors mentioned: - They only mutated conditionals to test coverage, not any of the other possible errors the test suite may be looking for. - Equivalent mutations may be miscounted, particularly when developers test a lot of off by one errors. - There data may not meet the assumptions of the Kendell r correlation used. - Most of their data had low levels of coverage - previous research shows high coverage is needed before it is related to effectiveness. - They did not account for object oriented code boilerplate code (getter/setters) which do not need to be tested, causing their counts to be potentially be off. This is major, as they were using only Java projects. - They had very narrow inclusion criteria, so the results may not be generalizable to all codebases. The projects had to have over 1000 test written; the average LOC was generally in the 100k range. This honestly sounds like some grad students final project, not advice for real world digestion. |