Hacker News new | ask | show | jobs
by anony23 1383 days ago
What purpose does it serve?
5 comments

They need to run a JavaScript function to download YouTube videos at normal speeds.

Edit: it's also required to download music, otherwise it will just fail

Source:

- https://github.com/ytdl-org/youtube-dl/issues/29326#issuecom...

- https://github.com/ytdl-org/youtube-dl/blob/d619dd712f63aab1...

- https://github.com/ytdl-org/youtube-dl/commit/cf001636600430...

Wow:

   Overview of the control flow (already known):
   The Youtube API provides you with n - your video access token
   If their new changes apply to your client (they do for "web") then it is expected your client will modify n based on internal logic. This logic is inside player...base.js
   n is modified by a cryptic function
   Modified n is sent back to server as proof that we're an official client. If you send n unmodified, the server will eventually throttle you.
So they can always change the function to keep you on your toes, hence you need to be able to run semi-arbitrary JS in order to keep using the API.

Waste of human brainpower but I guess that energy is better spent imagining a world where Google isn't in charge instead of kvetching about what they're doing with their influence.

There is a reason Google is able to serve the amount of video bandwidth, and also a reason why there are no worthwhile youtube clones. Some amount of scrape protection is absolutely essential.
Seems like they ultimately failed, youtube-dl is available freely as a pip package, anyone with scraping intent would have been able to use it.
I'd have to read up on the specifics as well, but I think basically Youtube uses a lot of obfuscated, rapidly and automatically changing Javascript code to fetch the video data. A project like youtube-dl has to run this code to be able to download videos, because that's what's happening in the browser as well.
For those interested further, in some of the past few weeks youtube-dl had stopped working intermittently for multiple hours at a time, and it was precisely related to this code.

We have a custom-made Discord music bot on our server which uses ytdl to stream songs so we can listen together, and at one point we were listening and suddenly got some obscure JavaScript error.

We began joking that there's some bug in the code which breaks it after 6PM, but later found out that Google had changed some of the obfuscated JS and this basically broke this part of code, which prevented us from fetching the song information.

If you start a youtube video and then pause it and resume a few days later, you'll notice that the youtube page plays for ~30 seconds (ie. whats buffered) and then the page refreshes. I'd guess this refresh is to pick up the new javascript and any updates to the HTML code.

It's kinda annoying if you have a lot of youtube tabs open for a long time and come back to them.

What is interesting is it seems to be constant cat and mouse. I download a YT vid. It crawls. Update yt-dlp, it flies again. I love yt-dlp and use it a lot.
But why not just use a normal JS engine called from Python?
It's used in the YouTube extractor: https://github.com/ytdl-org/youtube-dl/blob/d619dd712f63aab1...

I believe YouTube limits your bitrate if you don't pass a specific calculated value; it's possible youtube-dl has to parse and eval JS to get it.

> I believe YouTube limits your bitrate if you don't pass a specific calculated value

It's starting to become Widevine bullshit all over again.

It's their platform. They can do with it what they want.
Many channels would be more than happy to enable download options, if possible.

Hell, how is Creative Commons licence they totally give you option to select, work in case of videos that can't be downloaded in any way?

But would the channel owner be happy to enable download options if $0.09 per GB downloaded was subtracted from their ad revenue?
If you cite a price that high for bulk data then if you get an answer of "no" it won't prove anything. Try asking about a competitive price.

For ballpark numbers, youtube dedicates 1200kbps to 1080p videos in VP9. Let's say we have a 10 minute video with an RPM of $3.

We can arrange a CDN to deliver files at $0.005 per GB without even putting effort into it. And that's at a super low scale. The price drops a lot from there as things get bigger. So I'll use that number, and note that it's being generous to google.

So that's 0.3 cents of revenue per watch, which is 90MB of data that would cost .045 cents to deliver.

One view would pay for about 7 downloads. And how many downloads are we likely to see? Probably under 10% of viewers.

I'd turn that option on.

They've also chosen to be a monopoly.
Just because they have the right to do it doesn't make it right.
it’s sort of an extension of the state / surveillance
It's their platform but it's also a web site and that comes with certain expectations of interoperability.
You need to run some obscured JS to get decent download speeds from Youtube. Something along the lines of PoW.
It’s not like proof of work at all. It’s just a challenge and response; youtube includes a random number in the webpage for each video, and expects to see a request parameter with a particular value calculated from that random number when you request the video. If you don’t do the arithmetic it throttles you to 50kb/s.

Since the calculation of the response is done in JS, and they occasionally change the formula, some download programs are moving towards running the JS rather than trying to keep up with the changes.

It’s really just bullshit to make people’s lives harder.

Next step will probably be moving the calculation to webassembly or requiring the script to fetch the result via websocket or webrtc...
.. pirate determination is a thing to behold, as is crazed-repetitive digital grabs.. Its not a fair or accurate characterization to dismiss it as "making people's lives harder" .. it is remarkable that the Debian distros now include ytdl; lets do what is reasonable to make it continue
You can’t exactly pirate a youtube video, since they’re all publicly available.
That's not really how piracy works. I say this as an advocate of it.
YouTube PM: We need to stop youtube-dl.

Engineers: make half arsed attempt.

IIRC it's used to extract/generate the signatures needed for YouTube media URLs