Reverse-Engineering YouTube: Revisited | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Reverse-Engineering YouTube: Revisited (tyrrrz.me)
	208 points by tyrrrz 1232 days ago

12 comments

rhn_mk1 1232 days ago

> There is one thing that developers like more than building things — and that is breaking things built by other people.

Haha. This is not as universal as the author thinks. Every time I need to reverse-engineer something obscured on purpose, I wish we could just get along.

Every time I have to reverse-engineer something obscured by accident, I call it debugging.

But even if I solve the puzzle, it's like solving crosswords: I just defeated a human mind, the victory is transient, and will soon be forgotten. I'd prefer my victories to be against the frontier of knowledge, and to win universal truths. That means building things rather than tearing down those humans built.

I just wish there was more mathematical certainty and less human vices in programming.

pixl97 1232 days ago

Unfortunately the halting problem takes all your mathematical certainty and throws it out the window. It's very easy to take your application which will halt within a finite amount of time to one that will not. You'll find most programmers and companies are not going to spend the massive amount of time to ensure their logic is correct, but instead throw the application out there quickly and fix it based on crashes and feedback.

rhn_mk1 1232 days ago

Mathematical certainty is what to leverage, not what to fight. You'd use it before you run into the halting prbblem, not after. Just like mathematics was used to discover the halting problem in the first place.

And what you're describing as happening in practice is precisely the disappointing part of prgramming.

nocsi 1230 days ago

Yea the fix it later approach is an excuse that software engineers get to enjoy. Civil engineers are liable for their mistakes, and face fines/sanctions for their work. Meanwhile, software engineers can get away with half ass logic or mishandling of data and nothing comes of it.

In South Korea, a company with known software vulnerabilities is fined everyday until they fix it. Gives incentive to making sure software does the right thing before it gets shipped.

philipphutterer 1232 days ago

> I'd prefer my victories to be against the frontier of knowledge, and to win universal truths.

You wouldn't need to tear down barriers if the people that built them thought the same in the first place. Nonetheless, keep up that attitude.

nirav72 1232 days ago

I don’t particularly have anything to add about the article. But I do enjoy using your desktop youtube downloader , as well as couple of your .net libraries. Especially CliWrap. Amazing work. Just wanted to say thanks!

tyrrrz 1232 days ago

Glad to hear that :)

thrdbndndn 1232 days ago

This is also a good place to learn about it: https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/extracto...

2h 1232 days ago

Ive said it before on GitHub, I dont think `TVHTML5_SIMPLY_EMBEDDED_PLAYER` is the great solution everyone thinks it is. Yeah, you can get the age-restricted videos anonymously. However you can also get those videos by logging in, which the author doesn't mention:

    POST /youtubei/v1/player HTTP/1.1
    Host: www.youtube.com
    Authorization: Bearer ya29.a0AVvZVsqRwNWFI3R0MSxnugyNlxbqIOXcwXkeA6NMOcpv_...
    
    {
     "contentCheckOk": true,
     "context": {
      "client": {
       "clientName": "ANDROID",
       "clientVersion": "18.04.35"
      }
     },
     "racyCheckOk": true,
     "videoId": "Cr381pDsSsA"
    }

and `TVHTML5_SIMPLY_EMBEDDED_PLAYER` comes with strong drawbacks. Some videos under that client require a JavaScript signature for BOTH downloading and unthrottling. Each person is welcome to their own opinion, but I just dont think its worth the complexity of parsing some arbitrary JavaScript with Python when you can just log in (programmatically as above). Personally I use the ANDROID client, which avoids all JavaScript signatures. Also not mentioned in the article is that you can actually take the throttled URLs as is, and download pieces concurrently for a pretty good result. So each piece is still downloading slowly, but if you use on the order of 99 connections, you get decent speed. You would think you get IP blocked or something for this, but I downloaded quite a bit using this method as a test and the YouTube server allowed it. The combined resultant speed was only something like 2 MB/s, so big picture it doesn't seem like an abuse. My YouTube OAuth code is here for any interested:

http://2a.pages.dev/mech

philipphutterer 1232 days ago

> However you can also get those videos by logging in, which the author doesn't mention.

> Also not mentioned in the article is that you can actually take the throttled URLs as is, and download pieces concurrently for a pretty good result.

The author mentioned both, the login option as well as the chunking mechanism. Sorry, but did you actually read the blog post?

2h 1232 days ago

they mention cookies. that not the correct method for authenticating to the API, OAuth is.

tyrrrz 1232 days ago

If you can afford to always be logged in, then sure, but it's not always an option. Especially if you need a general solution.

2h 1232 days ago

who said anything about always? you log in as needed. most videos are open.

tyrrrz 1231 days ago

I mean, if you're not the sole user of the tool, you can't guarantee that everyone can log in (or would want to)

2h 1231 days ago

I doubt anyone WANTS to log in. 90% of videos are open. Yes, logging in is a burden, but coding wise its literally one extra line in the HTTP request. I think thats a fair tradeoff compared to parsing arbitrary JavaScript.

each person can weigh the pros and cons and make their own decision, but I dont think its as black and white that TVHTML5_SIMPLY_EMBEDDED_PLAYER is the best option, and the article doesn't even discuss OAuth, so I dont think its presenting a balanced take on the different approaches.

zxcvbn4038 1232 days ago

If you really want to understand how streaming video works then it definitely takes you down a couple rabbit holes - but it’s worth it. I think more people and companies should try to stream their own video content rather then be at the mercy of Google, their algorithms, and their censorship. You don’t have to “be” YouTube and host other users content but you should be able to host your content without YouTube’s approval.

petra 1232 days ago

There are probably many paid video hosting platforms. You can't save that much by hosting it yourself.

Anyone who is hosting on YouTube is looking for a free service.

mikae1 1232 days ago

> Anyone who is hosting on YouTube is looking for a free service.

https://archive.org/help/video.php is also “free”.

nyanpasu64 1232 days ago

Interestingly I found that YouTube's web UI actually requests range URLs rather than range HTTP headers, allowing it to seek around the video faster than mpv with yt-dlp (and conveniently avoiding throttling as well). I suspect this may be related to DASH: https://github.com/mpv-player/mpv/issues/10601

Unfortunately mpv and ffmpeg do not currently have mature DASH support and cannot benefit from fast seeks: https://github.com/mpv-player/mpv/issues/7033 (didn't look deeply)

2h 1232 days ago

> and conveniently avoiding throttling as well

throttling is not avoided. the YouTube web client generates a JavaScript signature that disables the throttling, same as what the code in the article does.

rasz 1231 days ago

Signature is old news (couple of years), generating proper one (or straight up copying from YT using devtools) wont get you unthrottled access.

2h 1231 days ago

yeah, it will:

https://github.com/ytdl-org/youtube-dl/pull/30184

rasz 1230 days ago

&n= is not "same as what the code in the article does". Article talks about old signatureCipher/sp/s/sig code. Without signatureCipher urls return 403, with signatureCipher but without decoded &n= urls return fine, but start throttling after just over ~1MB. My comment from Oct 2021:

https://github.com/ytdl-org/youtube-dl/issues/29326#issuecom...

"server heavily throttles any request to same URL after initial 2-4MB regardless of retries.

&n is only part of the puzzle. While bad or no &n will indeed trigger 50KB/s throttling, even correct &n only lets you download at most couple megabytes at good speed. Try any video in official YT client and you will see repeated URL request with different &range= parameters all use same &n, but trying to download that URL all at once will always throttle after initial ~2-4MB.

The correct solution (after generating correct &n) is to start using custom URL &range= parameters instead of normal HTTP range headers and default to downloading in 2MB chunks."

That was the case in 2021. I just checked newest "fixed" yt-dlp and if you change chunk size from default 10MB to 100MB you will quickly notice throttling is STILL there kicking right around that 2-4MB mark, but instead of brutal 50KB/s its somewhere around 1MB/s. Default Chunk size of 10MB somewhat helps to mask/hide it by smoothing jumping up and down transfer. youtube-dl (im shocked its still updated, cant download .exe, have to download zip and run "python.exe __main__.py"?) just silently ignores "--http-chunk-size 100000000" altogether and keeps downloading in 10MB chunks to hide the problem. "--print-traffic" shows 10MB chunks.

This is for all stream types other than 22 (mp4 1280x720 avc1.64001F, 30fps, mp4a.40.2). 22 seems to be special and with proper &n= you can slurp whole file with one connection without additional throttling, probably for backward compatibility with older clients?

TLDR: You can still download YT videos IF you chop them up into small chunks. Playing back without chopping up into chunks somewhat "works" because 1MB/s=10Mbit is still above thickest juiciest bitrate YT would ever serve, but problems become obvious when you start fast forwarding/skipping around the video (1-3 second pauses in mplayer). Playing type 22 works great and seeking is instant.

"Personally I use mplayer to stream YT and am currently on a lookout for a simple proxy server I could modify to do the above (divide into chunks, rewrite HTTP range header into URL parameter) for me transparently in the background."

ape4 1232 days ago

Its too bad such an important resource (youtube) has a secret API - that changes all the time.

Genghis_Khan 1232 days ago

> secret API

In their client-side code, they provide a worked example of how to use their API. That's hardly the way to keep a secret.

thrdbndndn 1232 days ago

Why? YouTube has a proper public API, that doesn't change all the time.

squarefoot 1232 days ago

They however a few years ago started forcing API users to authenticate, so when I had to spend months in bed after a bad road accident and later a heart attack, I couldn't anymore watch my favorite electronics channels using the Kodi YT extension unless I would authenticate. I guess they still allow anonymous use with a browser only because by doing that they can profile more people.

kfarr 1232 days ago

The proper public API notably does not provide access to the raw video steam making it useless for many use cases

anamexis 1232 days ago

Can you retrieve videos with it?

morgannewman 1232 days ago

Furthermore, the economics of video hosting sites like YouTube are such that you have truly incredible storage, server, and bandwidth growth, basically forever. I don’t think it’s feasible for there to be a “free” API that lets people use YouTube as they please, build clones of the site with no ads, etc.

kyberias 1232 days ago

Well that link to the introduction of Prolog video is not a really good starting point.

jscipione 1232 days ago

Why do the comment counts almost never match the actual number of comments? I know the answer is censorship but why doesn’t YouTube shadow-ban the comment count when they shadow-ban comments?

charrondev 1232 days ago

Honestly i don’t know what type of comment moderation they are doing but it’s pretty horrible. I constantly see obvious spam links or scammers as first level nested comments, often pretending to be the video author doing “giveaways” or trying to siphon off information. It’s incredibly widespread and has been happening for months at least.

jscipione 1232 days ago

Ok that’s an example of seeing spam comments, I’m asking about the comments you CAN’T see. For example the link says 3 replies but when you click it there is only 2 comments listed. Is there a technical reason why the reply count is not updated when comments are removed?

amelius 1232 days ago

Does anyone know if YouTube runs Ffmpeg internally?

randomifcpfan 1232 days ago

Circumstantial evidence that they used to: https://multimedia.cx/eggs/googles-youtube-uses-ffmpeg/

These days they use special hardware accelerators: https://gwern.net/doc/cs/hardware/2021-ranganathan.pdf

latchkey 1232 days ago

I built software to efficiently run a large number of GPUs (>120k) in data centers. That second link is fantastic, but it really gives me PTSD. =)

pixl97 1232 days ago

While not fully related to the code itself, my daughter has a school provided Chromebook that blocks almost all Youtube video content. You can browse the YT site, but the thumbnails and videos won't load. I'm assuming there is some kind of content block occurring here based on some part of the URL.

Well, kids being clever figured out the Chromebook browser shows a preview video if you hit the 'share' button and go to embed video. This is not content blocked. I didn't dig in to see if it would play age restricted content as I assume all access is being logged somewhere and want to minimize future fall out.

paulpauper 1232 days ago

And now many of these bypasses and tricks will stop working.