Pretty much all of these "most" findings are explainable by the distribution of the installed user base, i.e. they are not real results but just artifacts of the population sizes.
Yes, you’re right! The report covers data from Instabug users only. We extracted data from 30K apps with a range of user base sizes, locales, devices, etc. I definitely agree with you, it’s not a definitive representation of the market, but we believe we have a good enough sample and that the findings are valuable for app developers.
We couldn’t find any other data on mobile bugs like this, so we decided to share what we have for the app dev community to have some benchmarks and insights.
Sample size isn't the issue here, although a large heterogenous sample is good.
The complaints are e.g. "Most bugs are reported from iPhones" because they are a very popular type of phone with the customers more likely to report bugs. It doesn't necessarily mean the iPhone is buggier than others.
Right, we're definitely not saying the iPhone is buggier than others. I guess the issue is with the wording of the claim. It would be more accurate to say "Most bugs reported through Instabug are from iPhones"
That's also why we included the bugs/user data since it shows a completely different distribution across devices.
I think you're right for the most part, but there are some interesting gems in there which it would be real interesting to get more information on, such as the "Bugs/User vs. Device Manufacturer" graph.
Part of that is explained in the comments here by an employee saying they assume Google (and to a degree iPhone/iPad I'm sure) get increased numbers because devs might use them for testing and thus more bugs are seen, but that does raise interesting questions about why LG leads them all in that metric.
Almost all the graphs that are for total bugs instead of normalized to number of users show very little that is useful. One exception to that I noticed is the bugs to battery level, and that was only useful in that it's a reminder that mobile devices spend a lot of time running while plugged in at full battery, which is just as easily said than inferred through a graph like that.
- You’re absolutely right, not all the 100M bug reports are actually bugs. However, we highlighted this under the "Time to Close" section: "These are most likely not programmatic bugs, but could be support issues or spam." How do you think we should highlight this more to avoid confusion?
- About bug resolution time, Instabug is used by many companies as their main bug reporting tool or they forward these bugs to another bug tracker like Jira and we have a two-way sync so whenever it gets resolved over there, it’s resolved at Instabug as well. That’s why we used the word "resolved" not "fixed" because each company has their own definition. I hope this makes sense.
- About the quality and reliability of the data: Oh, we didn’t mean to be protective about this! On the contrary, we’d love to get your feedback. What would you like to know?
- About your third point, I respectfully disagree. As the person who spent the most hours working on this report, I can tell you honestly that it was not for PR. We just wanted to put something out there that would hopefully be valuable to the people in our community. We initially shared this with our own users for them to have benchmarks. This is the first time we’ve released anything like it, so it was an experiment for us to be honest and I’m loving all these comments because it helps us know what to do better next time around.
I think that this report is only useful in showing app developers that the patterns they encounter in their bug reports are common in the entire ecosystem, not special to their specific app requiring further investigation. Keep the patterns, keep the information about integration with external tools (customers might find it useful). Scraps the rest.
The main problem in the report is that you try to answer questions which your data and analysis is inherently incapable of answering. For example:
- "Which manufacturers have the most bugs?"
- "Which UI orientation has more issues?"
- "Which locale has the most bugs?"
- "How does battery affect app stability?"
- "Which OS has buggier apps?"
As other commenters have mentioned, your results could be just artifacts of the user demographics (or any number of other confounders). The answers are, at best, meaningless.
There are significant inconsistencies in figures 1 and 2. They definitely do not agree with "Errors discovered through Instabug are most likely to be resolved within 24 hours of being reported." (except in the narrow technical sense of the first day being the most likely day).
Even if the data was sufficient, there's no mention of statistical significance in comparisons. For example, Danish is the locale with the most bugs per user. However, you have quite a lot of locales and random variability is expected. Is the difference statistically significant?
Is there anything specific you'd like to know more about? It would be great to understand what kind of info people are looking for to know what to publish in the future.
Good point and sorry to disappoint you! The good news is that we still have a lot to share, this is the first time in six years to dig deeper into our data and share it with the community. I’m sure we’ll do more and more soon. A series about the most common causes of bugs and suggestions on how they could be avoided would definitely be a great start!
Yeah going by that graph it appears that ~1.5% of bugs are fixed in 24 hours, ~5% within a week, ~10% within 30 days and only ~13% of bugs are fixed at all. That leaves 87% of bugs still to be resolved.
From the graph of total bugs vs time. It appears like at the current time ~10% of all bugs have been reported in the last 30 days. Even if all those bugs were magically fixed tomorrow, that would only be ~20% of bugs within 30 days so we can claim:
"Bugs discovered through Instabug are unlikely to be resolved within 30 days" and "1.5% of bugs discovered through Instabug are likely to be resolved within 24 hours of being reported"
The figure is showing percent of all bugs, not percent of resolved bugs. Likely the rest of the 100% is unresolved bugs.
The confusion on the second point hinges in "most likely". You're likely interpreting that as the expectation of resolution time whereas they are using maximum likelyhood estimation. MLE is rather useless in this case, but it is technically still correct.
You’re right! Thanks for adding your thoughts to clarify. The wording “most likely” could be confusing indeed.
How about we change it to be “Bugs discovered through Instabug are most often resolved within 24 hours of being reported.” Would that be clearer? And also saying that this is percent of all bugs, not percent of resolved bugs
This is what happens when you hire computer science graduates as data scientists, you have incorrect data that you believe is correct. On the other hand those who hire statisticians are more successfull in collecting and analizing data.
Graphs are for asking better questions, not for making decisions.
When I use graphs to brainstorm ways to verify the existence of a problem, I have a lot better time than when we jump to conclusions. There's something a little rotten in pretty much any projection of data that you try. Building policy off of a graph is a bad, bad plan.
Such a bad plan in fact that Mark Twain has a joke about it.
Thanks for checking out the report! I'm part of the team who put it together :)
Yeah, we thought LG was interesting too. When it comes to Android, we expected Samsung to take first place tbh, but we found more bugs/user reported from LG and Google devices (Fig. 9). This could be explained by our technical user base and the popularity of Nexus devices with Android developers. So the higher proportion of bugs/user we see reported is most likely due to internal beta testing by devs.
We went into this with some expectations and were surprised by some other findings as well... like Danish being the top locale where bugs/user are reported from :D
So this means kibd of that there are more iPhone users in Instabug? Like let's say population is 100. There are 70 iPhone, 10 Samsung, 10 LG, 10 Nokia. 70 iPhone users have 70 bugs, that is 1 bug per user, and similarly 1 bug per Samsung, 1 bug per Nokia. But 15 bugs for 10 LGs. That is 1,5 bug per LG user. In short; 70 bugs in iPhone is actually same or may be better than Samsung. It really makes no sense expet that there are more iPhone users. The only useful information is that LG has more bugs per user. In other words the most useful information is persantage. I'm sure you don't have any Statistician in your work envirement because if you had he would say "let's remove that 'Most bugs are reported from iPhones' part because it makes no sense'. Guys don't hire computer science graduates as statisticians (buzz word: data scientist), hire statistics graduates as statisticians (buzz word: data scientist).
Interesting! Could be. Our analysis is based on what we know about our users' behavior but certainly not definitive. All the data here is open to interpretation.
"Errors discovered through Instabug are most likely to be resolved within 24 hours of being reported" is one of the TL;DR points, but only ~1.5% of bugs are resolved within 24 hours.