Hacker News new | ask | show | jobs
by colejohnson66 1115 days ago
Strictly speaking, sure. But calling every data-extrapolated result a "guess" wrongly, IMO, lumps it in with guesses based on no evidence.
2 comments

I'm sorry, but the extension has a grand total of 14k users.

There are some 368 million DAILY active users on youtube.

It is making claims based on a dataset of roughly 0.0003% of the population of users.

It's a GUESS. A bad one at that, since the people who install that extension are absolutely not representative of the general youtube user.

If we expand it out to the 2.28 BILLION monthly active userbase... the data from the 14k users is basically meaningless.

---

Think of it this way - if you were seconds in the day, those extension users are 25 seconds. if I were to try to measure any sort of meaningful data in a day by using 25 seconds of data, I would likely be horribly, horribly wrong.

Ex: My water company billed me and it's bullshit, I've been carefully tracking usage data for 30 seconds after I wake up every day, and I never measure any usage! Why are they billing me?

Holy cow, I measured our water usage today and we used a whole gallon over the 25 seconds I measured!!! We're blowing through nearly 3000 gallons a day!

---

Both are horribly, horribly wrong estimates. A sample size that small is not very valuable.

Keep in mind that the type of people who are going to use this extension will also likely only view a specific domain of video content. While yes, it'll be a very small sample size on the whole, those users will still be representative of the broad strokes for that kind of content.

Like, let's say that the audience is specifically going to be interested in tech content (not too big of a stretch). With tech content, there's a couple of standout creators that are... at least somewhat universally interesting/viewed (ie. Tom Scott). As a result, you can fairly reliably conclude that any dislike count on those creators will be at least percentage-wise accurate enough. OTOH, let's say that this audience is not interested at all in "prank videos". (This is a personal bias - this is something I cannot stand myself.) As a result, those videos will have less registered data on the backend, and as a result the dislike counter for those extensions will be less accurate as a result, but for the audience that has this extension installed it won't matter.

Others have already pointed out that the extension has about half a million users already, but even if it was as low as you are suggesting, it can still be very useful in that specific criteria.

I don't think anyone is doing serious usage analysis on dislike/like counts with the data from this extension, people just like having a general idea on what the ratio is.

https://chrome.google.com/webstore/detail/return-youtube-dis...

https://addons.mozilla.org/en-US/firefox/addon/return-youtub...

4,000,000+ users on chrome with 14k reviews. Maybe you are mistaking the review count for the user count?

Even if it were just 0.0003% that's still the same sampling rate as the average Gallup poll using 1000 people to represent the USA's 300,000,000+ population.

Where do you get 14k from? The chrome store says 4 million users and firefox says 400k. And an unknown amount of users using the many modded mobile youtube clients that have it builtin.
It's more valuable than having an invisible dislike count. If i found out one person with the extension (that I also use) disliked the video, that is infinitely more helpful than just having a blank dislike button with no statistics.
> Think of it this way - if you were seconds in the day, those extension users are 25 seconds. if I were to try to measure any sort of meaningful data in a day by using 25 seconds of data, I would likely be horribly, horribly wrong.

How many seconds (or fractions of) did you spend looking at the page? How could you have missed the actual download count if not for likely closing the tab as soon as you saw the review count, which was just to the left of it?

It doesn't matter how representative the data is of the wider userbase as long as as it accurately represents the opinions of the people who use the extension, since those are the only people who see the result. The sample size is only an issue insofar as most videos won't get any votes.
It seems pretty accurate to me. Maybe the other users are similar to me.
It's not a guess, it's much worse than a guess. It's inherently biased: it collects data from people who care downvote count to the extreme only. They care the count so much that they installed an extension specifically for it. Think about it.

It's not randomly sampled from the whole demographic. It's not SteamDB*.

It's not different from getting "the opinions of generic US people" from the comments below Biden/Trump's tweets.

* Technically SteamDB has a bias too: it samples from the players who made their profiles public only.