What We Learned from Analyzing 100M Bugs

Y	Hacker News new \| ask \| show \| jobs

	What We Learned from Analyzing 100M Bugs (instabug.com)
	60 points by okgabr 2894 days ago

8 comments

rokob 2893 days ago

Pretty much all of these "most" findings are explainable by the distribution of the installed user base, i.e. they are not real results but just artifacts of the population sizes.

link

aprilledaughn 2893 days ago

Yes, you’re right! The report covers data from Instabug users only. We extracted data from 30K apps with a range of user base sizes, locales, devices, etc. I definitely agree with you, it’s not a definitive representation of the market, but we believe we have a good enough sample and that the findings are valuable for app developers.

We couldn’t find any other data on mobile bugs like this, so we decided to share what we have for the app dev community to have some benchmarks and insights.

link

minimaxir 2893 days ago

Sample size isn't the issue here, although a large heterogenous sample is good.

The complaints are e.g. "Most bugs are reported from iPhones" because they are a very popular type of phone with the customers more likely to report bugs. It doesn't necessarily mean the iPhone is buggier than others.

link

aprilledaughn 2893 days ago

Right, we're definitely not saying the iPhone is buggier than others. I guess the issue is with the wording of the claim. It would be more accurate to say "Most bugs reported through Instabug are from iPhones"

That's also why we included the bugs/user data since it shows a completely different distribution across devices.

link

kbenson 2893 days ago

I think you're right for the most part, but there are some interesting gems in there which it would be real interesting to get more information on, such as the "Bugs/User vs. Device Manufacturer" graph.

Part of that is explained in the comments here by an employee saying they assume Google (and to a degree iPhone/iPad I'm sure) get increased numbers because devs might use them for testing and thus more bugs are seen, but that does raise interesting questions about why LG leads them all in that metric.

Almost all the graphs that are for total bugs instead of normalized to number of users show very little that is useful. One exception to that I noticed is the bugs to battery level, and that was only useful in that it's a reminder that mobile devices spend a lot of time running while plugged in at full battery, which is just as easily said than inferred through a graph like that.

link

choward 2893 days ago

It's like those maps that are supposed to reveal something interesting but all they reveal is population.

link

keerthiko 2893 days ago

I believe the reference you were looking to make was

https://xkcd.com/1138/

link

oldgradstudent 2893 days ago

They haven't actually analyzed 100M bugs, they've analyzed a list of bug reports.

They haven't analyzed how quickly bugs are resolved, they've analyzed how quickly bugs are marked as resolved.

The distinction is important. Nowhere in the report there's any attempt to judge the quality of the data and its reliability.

Oh, and the honest answer to the question "Why did we create this report?" is probably PR.

link

aprilledaughn 2893 days ago

- You’re absolutely right, not all the 100M bug reports are actually bugs. However, we highlighted this under the "Time to Close" section: "These are most likely not programmatic bugs, but could be support issues or spam." How do you think we should highlight this more to avoid confusion?

- About bug resolution time, Instabug is used by many companies as their main bug reporting tool or they forward these bugs to another bug tracker like Jira and we have a two-way sync so whenever it gets resolved over there, it’s resolved at Instabug as well. That’s why we used the word "resolved" not "fixed" because each company has their own definition. I hope this makes sense.

- About the quality and reliability of the data: Oh, we didn’t mean to be protective about this! On the contrary, we’d love to get your feedback. What would you like to know?

- About your third point, I respectfully disagree. As the person who spent the most hours working on this report, I can tell you honestly that it was not for PR. We just wanted to put something out there that would hopefully be valuable to the people in our community. We initially shared this with our own users for them to have benchmarks. This is the first time we’ve released anything like it, so it was an experiment for us to be honest and I’m loving all these comments because it helps us know what to do better next time around.

link

oldgradstudent 2893 days ago

Sorry if I impugned your motives.

I think that this report is only useful in showing app developers that the patterns they encounter in their bug reports are common in the entire ecosystem, not special to their specific app requiring further investigation. Keep the patterns, keep the information about integration with external tools (customers might find it useful). Scraps the rest.

The main problem in the report is that you try to answer questions which your data and analysis is inherently incapable of answering. For example:

- "Which manufacturers have the most bugs?" - "Which UI orientation has more issues?" - "Which locale has the most bugs?" - "How does battery affect app stability?" - "Which OS has buggier apps?"

As other commenters have mentioned, your results could be just artifacts of the user demographics (or any number of other confounders). The answers are, at best, meaningless.

There are significant inconsistencies in figures 1 and 2. They definitely do not agree with "Errors discovered through Instabug are most likely to be resolved within 24 hours of being reported." (except in the narrow technical sense of the first day being the most likely day).

Even if the data was sufficient, there's no mention of statistical significance in comparisons. For example, Danish is the locale with the most bugs per user. However, you have quite a lot of locales and random variability is expected. Is the difference statistically significant?

link

arriu 2893 days ago

Agreed, I was excited to find out whether data about the bugs could have been used to predict future bugs.

link

aprilledaughn 2893 days ago

Is there anything specific you'd like to know more about? It would be great to understand what kind of info people are looking for to know what to publish in the future.

link

coldtea 2893 days ago

I expected something much more interesting: e.g. most common types of bugs, or causes of bugs -- and thus suggestions on how they could be avoided.

link

okgabr 2893 days ago

Good point and sorry to disappoint you! The good news is that we still have a lot to share, this is the first time in six years to dig deeper into our data and share it with the community. I’m sure we’ll do more and more soon. A series about the most common causes of bugs and suggestions on how they could be avoided would definitely be a great start!

link

dbwest 2893 days ago

On my Pixel 2 the download does not work for their report. I submit that bug so they can analyze it as their 100M+1.

link

LiamPa 2893 days ago

Ironic that I have a huge ‘download report’ banner across the middle of the screen (iPad Pro)

link

aprilledaughn 2893 days ago

Thanks for reporting this bug! Yup, Instabug has bugs too.

link

bradjohnson 2893 days ago

I think the y axis: "% Bugs" in Fig. 1 has an incorrect scale. It doesn't add to 100%.

Also, it doesn't seem consistent with the claim: "Bugs discovered through Instabug are most likely to be resolved within 24 hours of being reported"

link

barbegal 2893 days ago

Yeah going by that graph it appears that ~1.5% of bugs are fixed in 24 hours, ~5% within a week, ~10% within 30 days and only ~13% of bugs are fixed at all. That leaves 87% of bugs still to be resolved.

From the graph of total bugs vs time. It appears like at the current time ~10% of all bugs have been reported in the last 30 days. Even if all those bugs were magically fixed tomorrow, that would only be ~20% of bugs within 30 days so we can claim:

"Bugs discovered through Instabug are unlikely to be resolved within 30 days" and "1.5% of bugs discovered through Instabug are likely to be resolved within 24 hours of being reported"

link

ericpauley 2893 days ago

The figure is showing percent of all bugs, not percent of resolved bugs. Likely the rest of the 100% is unresolved bugs.

The confusion on the second point hinges in "most likely". You're likely interpreting that as the expectation of resolution time whereas they are using maximum likelyhood estimation. MLE is rather useless in this case, but it is technically still correct.

link

okgabr 2893 days ago

You’re right! Thanks for adding your thoughts to clarify. The wording “most likely” could be confusing indeed. How about we change it to be “Bugs discovered through Instabug are most often resolved within 24 hours of being reported.” Would that be clearer? And also saying that this is percent of all bugs, not percent of resolved bugs

link

bradjohnson 2893 days ago

Yep, that clears up my confusion. Thanks

link

darkhorn 2893 days ago

This is what happens when you hire computer science graduates as data scientists, you have incorrect data that you believe is correct. On the other hand those who hire statisticians are more successfull in collecting and analizing data.

link

hinkley 2893 days ago

As one of my better bosses used to say:

    Graphs are for asking better questions, not for making decisions.

When I use graphs to brainstorm ways to verify the existence of a problem, I have a lot better time than when we jump to conclusions. There's something a little rotten in pretty much any projection of data that you try. Building policy off of a graph is a bad, bad plan.

Such a bad plan in fact that Mark Twain has a joke about it.

link

fhood 2893 days ago

Hmm, all this is basically what I would expect....wait..."Most bugs are reported from iPhones, while more bugs/user are reported from LG devices."

....Why LG?

link

aprilledaughn 2893 days ago

Thanks for checking out the report! I'm part of the team who put it together :)

Yeah, we thought LG was interesting too. When it comes to Android, we expected Samsung to take first place tbh, but we found more bugs/user reported from LG and Google devices (Fig. 9). This could be explained by our technical user base and the popularity of Nexus devices with Android developers. So the higher proportion of bugs/user we see reported is most likely due to internal beta testing by devs.

We went into this with some expectations and were surprised by some other findings as well... like Danish being the top locale where bugs/user are reported from :D

link

darkhorn 2893 days ago

>Most bugs are reported from iPhones

So this means kibd of that there are more iPhone users in Instabug? Like let's say population is 100. There are 70 iPhone, 10 Samsung, 10 LG, 10 Nokia. 70 iPhone users have 70 bugs, that is 1 bug per user, and similarly 1 bug per Samsung, 1 bug per Nokia. But 15 bugs for 10 LGs. That is 1,5 bug per LG user. In short; 70 bugs in iPhone is actually same or may be better than Samsung. It really makes no sense expet that there are more iPhone users. The only useful information is that LG has more bugs per user. In other words the most useful information is persantage. I'm sure you don't have any Statistician in your work envirement because if you had he would say "let's remove that 'Most bugs are reported from iPhones' part because it makes no sense'. Guys don't hire computer science graduates as statisticians (buzz word: data scientist), hire statistics graduates as statisticians (buzz word: data scientist).

link

chatmasta 2893 days ago

Or maybe it’s the opposite, i.e. devs are not testing on LG devices so they miss corner cases and ship bugs to them?

link

aprilledaughn 2893 days ago

Interesting! Could be. Our analysis is based on what we know about our users' behavior but certainly not definitive. All the data here is open to interpretation.

link

1_800_UNICORN 2893 days ago

I'm very confused...

"Errors discovered through Instabug are most likely to be resolved within 24 hours of being reported" is one of the TL;DR points, but only ~1.5% of bugs are resolved within 24 hours.

link