Hacker News new | ask | show | jobs
by snug 239 days ago
Update from the dev:

> Unfortunately the extension requires quite a large database (~15TB) - and it costs money.

> The ad\changelog was supposed to show only on browser restart - i.e. be much less intrusive

> The idea behind paid features was also to cover the costs (since donations became smaller than the hosting costs).

> I messed up with implementation though.

> Sorry.

I do feel for the developer, and I am not anti asking for donations, and the full page pop up on browser restart I don't think is terrible, but it would have been better to maybe have a changelog and have a donation button. The ads injected directly into youtube make me lose a lot of trust

2 comments

> the extension requires quite a large database (~15TB)

Maybe I am missing something but how does a database which just needs to store video ID and a number become 15TB in size?

Also a user ID, which seems to be 36 base64 characters (can't have one user count for multiple votes).

Round up to 500 raw bytes per row (perhaps including time/ip and other random garbage, plus indexes), 3x replication/redundancy or something, for 6 million users each having voted on 500 videos, and you're at 6TB; still some ways off from 15TB, but not insurmountably far.

(votes/user is rather tricky to get; but, as a bit of random garbage statistics math: YT gets ~5B views/day and has ~3B users; 6M downloads of the extension means ~0.2% of users use it, so 10M extension-user views/day = 15B over 4 years, or 2.5K/user; assuming 20% vote rate (rather high but lets say extension users care more for voting and/or watch YT more than an average person), that's 500 votes/user)

500 bytes? A user ID couldn't be more than 8, a date is another 8, a video ID is another 8, and an IP is 16. Even if you assume there is some overhead, a database cannot possibly need more than 100 bytes per row.
That's assuming all of those are stored in packed formats, but even then it's not that low.

I already mentioned that the user IDs are 36 base64 chars, or 27 bytes if you store them max-packed; YT video IDs are 11 base64 chars, so 66 bits, doesn't quite fit in 8 bytes (not to mention that trying to pack the video IDs would mean your db becomes useless if youtube suddenly added a new video ID format). IP needs another bit somewhere for ipv4 vs ipv6, so likely 24 bytes (or just a string could be used).

Then you have some overhead for padding and string field lengths, and whatever overhead for packing the data for the disk storage (padding for all entires to stay within a page? maintaining a percentage of free space to ensure the B-tree property?). Then you have a copy of the fields in indexes, with whatever overhead those come with.

Granted, even that's probably around 200 bytes, not 500, in a reasonable db, but who's to say that the db used used is a reasonable & well-configured one; of course it's possible that a bunch more metadata is stored for user trustworthiness statistics or something, or duplicated tables where relations would work.

> user IDs are 36 base64 chars, or 27 bytes if you store them max-packed

Stupid. You aren't going to have 2^288 users, why do you need that many user IDs? A 64-bit integer is already overkill.

>YT video IDs are 11 base64 chars, so 66 bits, doesn't quite fit in 8 bytes (not to mention that trying to pack the video IDs would mean your db becomes useless if youtube suddenly added a new video ID format)

Your video table has a 64-bit integral row ID. You have a column that is a foreign key to it. Join on them.

>IP needs another bit somewhere for ipv4 vs ipv6, so likely 24 bytes (or just a string could be used).

All IPv4 addresses can be encoded as IPv6 addresses so this only requires 16 bytes.

>Then you have some overhead for padding and string field lengths,

None of these things are strings.

>and whatever overhead for packing the data for the disk storage (padding for all entires to stay within a page? maintaining a percentage of free space to ensure the B-tree property?).

This is pretty low overhead. It won't take you to hundreds of bytes per row.

>Granted, even that's probably around 200 bytes, not 500, in a reasonable db, but who's to say that the db used used is a reasonable & well-configured one; of course it's possible that a bunch more metadata is stored for user trustworthiness statistics or something, or duplicated tables where relations would work.

If it stores IDs as strings then the DB probably won't be set up correctly either. That would be clearly wrong and wrong people are usually wrong about other things too.

Unfortunately, nice as it would be, your level of perfectionism is unfortunately not particularly common; indeed, it's possible to do things much more efficiently, but for most purposes "it works" is enough; and when it starts to not be you already have terabytes of db and just adding more disk is much easier than the hassle of migrating the entire thing to something different.
Why not summarize every 3 months? It would allow someone to downvote every 3 months on the same video but it's easier to just install this extention on another profile.

It would bring the size down to under a 1T and allow the developer to go ad free. Hope this message reaches him.

That'd easily result in accidental re-votes if a user watches the same video every couple months (and such re-watchers would likely be ones that.. like the video, thereby skewing data away from dislikes over time).

Especially if the extension just sends the vote status instead of only reacting on a press (which'd allow it to send forward a vote done originally on, say, mobile; don't know if it does this, but it seems like a useful thing to do).

The way the plugin works (in my simplified understanding) is that it guesses how many dislikes there are based on the like/dislike ratio of the people that have the plugin installed. So if 100 people that have the plugin installed and there is a 90/10 like/dislike ratio, and the actual video has 1000 likes, it will say that there are 100 dislikes. Youtube not only took away the dislike UI, but stopped publicly giving the number of dislikes even behind the API.

But even then, the database could not get that big, you'd only need a few simple tables, one that tracks every plugin users like/dislike on the video they stored it on, and then a table that does the aggregations. 15TB sounds crazy.

I'm not a youtuber so idk what content creators could see, but it would have been smarter for them to go after the content creators that have the plugin installed instead of youtube users, not sure why we would care about those kinds of analytics

It's not a representative sample so the dislikes it shows aren't accurate it's a bad estimate. I also heard some content creators that said they compared with real dislike numbers and it was way off.
I wasn't really upset about the removal of the button but this add-on seems superfluous. What benefit does it give users to see how many other users of this extension disliked a video? I would understand if it helped shape your recommendations or home page feed but I'm at a loss here.
Imagine a video is recommended that is for a specific how-to search; if it has a poor rating then you can be confident it's a bad match for your search.

E.g. a plumbing video for fixing a tap with a bad rating is unlikely to actually tell you how to do so.

I mean, isn't that the point of comments? I have a hard time believing a video can have high likes and not a single higher up comment countering it. At least in a realistic example like household repair or something. I also tend to skim videos for content as well to verify or find it. So maybe I'm just more diligent than most.
Video creators can freely delete comments, which makes relying on them for much sketchy.
It could keep track of each unique dislike. Then maybe the best we can do is use HyperLogLog (or HyperLogLog++ or HyperLogLogLog) per video id.
Even if it's tracking each unique dislike, I can't see how it would explode to 15TB of data.
I don't feel for the developer. If donations is smaller than hosting costs, just stop putting time into it. It's a stupid gimmick anyway.
I consider it to be more valuable than a "stupid gimmick", even if more in message than in direct utility - it fights to retain peer-voting and crowdsourced evaluation on the quality of online information, which is something that seems to be dwindling away with each passing year.
The problem is that the number you see is pointless. There's a limited demographic who cares about the dislike numbers so much they go out and install a browser extension, which in then biases how they react to a video.

It's basically an echo chamber extension.

Not sure what you mean by "fight back", either. Google 100% could not care less if you install this extension. They aren't bringing it back unless the division's leadership has a change of heart on whatever internal goals they wanted to achieve by removing it.

I tried to make Twitter and Reddit more usable as they started to enshittify, but after a certain point I realized it was a fundamentally losing battle and gave them up altogether. Those sites were not one or two features away from becoming usable again, the rot was more comprehensive. Today in hindsight, I can see that moving on was the only way to go.
Correct, that's why I called it a stupid gimmick. You're trying to fight them on their turf, it'll consume your life and they will wipe you out in a second if they feel like it.
15TB smells of incompetence. Putting ads, even more. Fucking up the ad displaying logic as he says: even more...