| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Lewisham 4135 days ago

So the follow-up paper that assesses the impact is here [1]

TL;DR is that developers just didn't find it useful. Sometimes they knew the code was a hot spot, sometimes they didn't. But knowing that the code was a hot spot didn't provide them with any means of effecting change for the better. Imagine a compiler that just said "Hey, I think this code you just wrote is probably buggy" but then didn't tell you where, and even if you knew and fixed it, would still say it due to the fact it was maybe buggy recently. That's what TWR essentially does. That became understandably frustrating, and we have many other signals that developers can act on (e.g. FindBugs), and we risked drowning out those useful signals with this one.

Some teams did find it useful for getting individual team reports so they could focus on places for refactoring efforts, but from a global perspective, it just seemed to frustrate, so it was turned down.

From an academic perspective, I consider the paper one of my most impactful contributions, because it highlights to the bug prediction community some harsh realities that need to be overcome for bug prediction to be useful to humans. So I think the whole project was quite successful... Note that the Rahman algorithm that TWR was based on did pretty well in developer reviews at finding bad code, so it's possible it could be used for automated tools effectively, e.g. test case prioritization so you can find failures earlier in the test suite. I think automated uses are probably the most fruitful area for bug prediction efforts to focus on in the near-to-mid future.

[1] http://www.cflewis.com/publications/google.pdf?attredirects=...

3 comments

nostrademons 4135 days ago

I was one of the interviewees for the study (or at least, I remember ranking those three lists as described in the experimental design).

My impressions were that the results of the algorithm were pretty accurate, but they were not very actionable. Very often, the files identified were ones the team knew to be buggy, but there were good reasons they were buggy, eg. the problem the code was solving was complex, that area of the code was undergoing heavy churn because the problem it solved was a high priority, or the code was ugly but another system was being developed to replace it and it wasn't worth fixing when it was going to be thrown away anyway. In some cases, proposals to fix or refactor the code had been nixed repeatedly by executives.

Basically - not all bugs are created equal. Oftentimes code is buggy because it's important, and the priority is on satisfying user needs rather than fixing bugs.

link

sukilot 4135 days ago

This seems worth a followup post to mentioned that the idea didn't pan out.

link

riyadparvez 4135 days ago

I work in software reliability (bug finding through dynamic program analysis) which is a related domain of this research.

Most of these machine learning based software engineering research tools are based on unrealistic scenarios, full of over-promises and very little to deliver in real life.

link