Hacker News new | ask | show | jobs
by godelski 710 days ago
I think this is really important in that it is bigger than "code reviews." It does show how people greatly misunderstand statistics[0]. And what's even funny is at surface level the claim that code review "does nothing" __sounds__ ludicrous. But people "believe" because they are annoyed with code review, not because they "actually" believe the results.

But statistics are tricky. With the example given in the article "15% of smokers get lung cancer" compared to "80% of people with lung cancer smoke." These two are not in contradiction with one another but are just different ways to view the same thing. In fact, this is often how people will mislead you (or how you may unintentionally mislead yourself!) with statistics.

Another famous example is one that hits HN every once in awhile: "Despite just 5.8% sales, over 38% of bug reports come from the Linux community"[1]. In short this one is about how linux users are just more trained to make bug reports and how most bugs are not system specific. So if you just classify bugs by the architecture of those submitting them, you'll actually miss out on a lot of valuable information. And because how statistics work, if the architecture dependence rate was as low as even 50% (I'd be surprised!) then that's still a huge amount of useful bug reports. As a linux user, I've seen these types of bugs, and they aren't uncommon. But I've frequently seen them dismissed because I report from a linux system. Or worse, support sends you to their page that requests you to "upvote" a "feature" or bug issue. One you have to login to. I can't take a company like that seriously but hell, Spotify did that to me and I've sent them the line of code that was wrong. And Netflix did it to me saying "We don't block firefox" but switching user agents gave me access. Sometimes we got to just think a bit more than surface level.

So I guess I wanted to say, there's a general lesson here that can be abstracted out.

[0] Everyone jokes that stats are made up, but this is equally bad.

[1] https://news.ycombinator.com/item?id=38392931

2 comments

> support sends you to their page that requests you to "upvote" a "feature" or bug issue.

Microsoft does this for enterprise products where customers might be paying $100K/mo or even millions.

“We hear you, but your complaint is just not popular enough so go away.”

“Sure it’s a catastrophic data loss bug that ate your finance transactions, but if other people can’t identify that their seemingly unrelated crash is the exact same issue then no fix for you.”

“Now that you did get ten thousand votes on an issue titled ‘Consiser doing your job’, we’ve decided to improve your experience by wiping out the bug forum and starting a new one from scratch that has fewer scathing comments from upset users.”

My company/team has very different processes for bugs vs feature requests. If a customer opens a ticket and we determine it's a bug, we will generally fix it in the reported release and later (unless it's a security vulnerability or other major problem). But for feature requests we just tell them to submit it to a community and we evaluate it to see if it's valid and something we'd likely implement given the other work we have on our plate, but not necessarily do it any time soon.
Sometimes feature requests are actually bugs and can be illustrative of one not properly understanding design.

But I think it is important how user feature requests are interpreted. They have a frustration that you might not be aware of but they aren't aware of all the code and constraints. It can even be in design, which is still important. Very often there is a way to resolve a feature request that is not what the user explicitly asks for. But to do that you have to read between the lines, and carefully. Of course, some people go completely the wrong way with this and cough Apple cough decide that they know what is best for the user. It's totally a hard balance to strike, but I think it is very common for it to be framed much simpler.

There's the joke that the user is dumb, and maybe they are, but that doesn't mean the issue they face is. It's not always dumb when a person pulls on a door that says push, because it may actually be that the sign and design are saying different things[0]. And personally, I like when users suggest methods of resolving the problem. I might throw that in the garbage, but it can often give me better context clues as to what they're trying to ask for and really does tell me if they're thinking hard about the problem that they care about the product. They just don't have the same vantage point that I do, and that's okay.

[0] https://www.youtube.com/watch?v=yY96hTb8WgI

> Sometimes feature requests are actually bugs

You can have two missing features that add up to a bug in total. For example, I worked with two cloud products from the same vendor where a missing back-end HTTP feature of the CDN product interacted with a missing HTTP front-end feature of the PaaS service such that the two products that have a "natural fit" together couldn't actually be used in combination.

This made many architectures that ought to have worked a no-go, forcing customers into contorted design patterns or third-party products.

IMHO this is a bug ("Can't use your products"), but each team individually marked it as a missing feature and then they just ignored this for about three years.

Also: not enough people voted the missing features up because not enough people were using the products... because they couldn't.

I know this is a bit off-topic here, but it circles back to the "statistics is hard" intro in the original blog article. You can make catastrophic business mistakes relying on statistics you don't full understand, such as this example of "you won't get many complaints for unusable products".

You will get many complaints however for the usable products... they have users to complain.

https://en.wikipedia.org/wiki/Survivorship_bias

> because not enough people were using the products... because they couldn't.

I don't think this is off topic at all. I think is is explicitly on topic, at least the the underlying one. Not just statistics are hard, but it's hard to measure things and even harder to determine causality. Which is often the underlying goal of statistics and data science. To find out why things happen. Measurements are incredibly difficult and people often think they are simple. The problem is that whatever you're measuring is actually always a proxy and has uncertainty. Often uncertainty you won't know about if you don't have a good understanding of what the metric means. You'll always reap the rewards when putting in the hard work to do this, but unfortunately if you don't it can take time before the seams start to crack. I think this asymmetry is often why people get sloppy.

The example I like to use is the confusion around COVID statistics, and how people mis-interpreted them.

For example, the rate of infections (or deaths) per day that was reported regularly in the news is actually: rate of infections * measurement accuracy * rate of measurement.

I.e.:

If more people turn up to be tested, the "rate" would go up.

If the PCR tests improved, the "rate" would go up.

A similar thing applies with hospitalisations and deaths. It might go up because a strain is more lethal than another strain, or because more people are infected with the same strain, or because more deaths are attributed to COVID instead of something else.

It doesn't help that different countries have different reporting standards, or that reporting standards changed over time due to the circumstances!

Etc...

It's complicated!

You mean they don't censor the bug reports and try to gaslight you into believing their software is flawless anymore?

That's a tremendous improvement when compared to the time I interacted with them.

> don't censor the bug reports

They do, but eventually even the polite but grumpy comments build up to the point that it looks bad. These comments are public -- that's the whole point -- so the only way to hide them is to delete them. Normally this upsets users even more, so the "trick" is to "improve" the service by dropping the entire forum on the floor and starting over with a new piece of software. Not because it's better in any way, but because it is an implicit DELETE * FROM "BUGS".

Microsoft is on their... what... third forum now? I lost count.

Basically, code reviews also happen to find a lot of other non-bug stuff (probably nits and style issues).

That's why looking at % is dangerous. You could be finding 5 bugs per code review, which is a lot, but if you also make 30 other non-bug comments, suddenly "only 15% of comments are bugs".

Oh I completely agree. There are just a lot of things that can't so easily be measured and many things that can never be. But that doesn't mean they don't matter. Following the point you're making, enforcing good style can result in bugs not happening later on or even save a lot of future time as your code doesn't slowly spaghetti. And I think that's one where people often miss. That spaghetification happens generally through a slower process. By dozens of commits, not by a handful.