Hacker News new | ask | show | jobs
by Andrews54757 261 days ago
Nsig/sig - Special tokens which must be passed to API calls, generated by code in base.js (player code). This is what has broken for yt-dlp and other third party clients. Instead of extracting the code that generates those tokens (eg using regular expressions) like we used to, we now need to run the whole base.js player code to get these tokens because the code is spread out all over the player code.

PoToken - Proof of origin token which Google has lately been enforcing for all clients, or video requests will fail with a 403. On android it uses DroidGuard, for IOS, it uses built in app integrity apis. For the web it requires that you run a snippet of javascript code (the challenge) in the browser to prove that you are not a bot. Previously, you needed an external tool to generate these PoTokens but with the Deno change yt-dlp should be capable of producing these tokens by itself in the near future.

SABR - Server side adaptive bitrate streaming, used alongside Google's UMP protocol to allow the server to have more control over buffering, given data from the client about the current playback position, buffered ranges, and more. This technology is also used to do server-side ad injection. Work is still being done to make 3rd party clients work with this technology (sometimes works, sometimes doesn't).

Nsig/sig extraction example:

- https://github.com/yt-dlp/yt-dlp/blob/4429fd0450a3fbd5e89573...

- https://github.com/yt-dlp/yt-dlp/blob/4429fd0450a3fbd5e89573...

PoToken generation:

- https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide

- https://github.com/LuanRT/BgUtils

SABR:

- https://github.com/LuanRT/googlevideo

EDIT2: Addeded more links to specific code examples/guides

3 comments

If you ever wondered why the likes of Google and Cloudflare want to restrict the web to a few signed, integrity-checked browser implementations?

Now you know.

>If you ever wondered why the likes of Google and Cloudflare want to restrict the web

I disagree with the framing of "us vs them".

It's actually "us vs us". It's not just us plebians vs FAANG giants. The small-time independent publishers and creators also want to restrict the web because they don't want their content "stolen". They want to interact with real humans instead of bots. The following are manifestations of the same fear:

- small-time websites adding Anubis proof-of-work

- owners of popular Discord channels turning on the setting for phone # verification as a requirement for joining

- web blogs wanting to put a "toll gate" (maybe utilize Cloudflare or other service) to somehow make OpenAI and others pay for the content

We're long past the days of colleagues and peers of ARPANET and NFSNET sharing info for free on university computers. Now everybody on the globe wants to try to make a dollar, and likewise, they feel dollars are being stolen from them.

But this, too, skips over some nuance. There are a few types of actors here:

- small content creators who want to make their content accessible to individuals

- companies that want to gobble up public data and resell it in a way that destroys revenue streams for content creators

- gatekeepers like Cloudflare who want to ostensibly stop this but will also become rent-extractors in the process

- users who should have the right to use personal tools like yt-dlp to customize their viewing experience, and do not wish to profit at the expense of the creators

We should be cautious both that the gatekeepers stand to profit from their gatekeeping, and that their work inhibits users as well.

If creators feel this type of user (often a dedicated fan and would-be promoter) is a necessary sacrifice to defend against predatory data extractors… then that’s absolutely the creator’s choice, but you can’t say there’s a unified “us” here.

But then it's not (small creators + users) vs. the other parties you listed. Small creators, like small business, often exhibit the worst kinds of greed and exploitative behavior.

Also there's a lot of misalignment between users and providers at the cultural level - the society is yet to fully process the implications of "digital revolution" (and copyright industry meddling with everything isn't helping). A big chunk of that boils down to the same thing that started "the war on general-purpose computing": producers have opinions on how their products should be used, and want to force consumers to only use them as prescribed.

Whether it's because they want to exploit the consumers through a side channel (e.g. ads), or to "protect intellectual property", or because they see artistic value in the integrity of their creation, or because they think they know better than customers - reasons are many, but underneath them all, is the core idea the society hasn't yet worked out: whether, and to what degree, are producers even morally entitled to that kind of control.

My personal answer is: they're not (nor they are to their old business models). But then it's producers, not consumers, who have all the money and control here.

> small-time websites adding Anubis proof-of-work

Those were already public. The issue is AI bot ddos-ing the server. Not everyone has infinite bandwith.

> owners of popular Discord channels turning on the setting for phone # verification as a requirement for joining

I still think that Discord is a weird channel for community stuff. There's a lot of different format for communication, but people are defaulting to chat.

> web blogs wanting to put a "toll gate" (maybe utilize Cloudflare or other service) to somehow make OpenAI and others pay for the content

Paid contents are good (Coursera, O'Reilly, Udemy,...). But a lot of these services wants to have free powered by ads (for audience?).

---

The fact is, we have two main bad actors: AI companies hammering servers and companies that want to centralize content (that they do not create) by adding gatekeeping extension to standard protocols.

> Now everybody on the globe wants to try to make a dollar, and likewise, they feel dollars are being stolen from them.

I'm not in it for the dollar. I just want the licenses I put on my content/code to be respected, that's all. IOW, I don't what I put out there to be free forever (as in speech and beer) to be twisted and monetized by the people who re in this for the dollar.

I don’t feel like dollars are stolen from me. It’s more of companies abusing my goodwill to publish information online. From higher bills as a result of aggressive crawling, to copying my work and removing all copyright/licensing from the code. Sure, fair use and all, but when they return the same exact code it just makes me wonder.

Nowadays, producing anything feels like being the cows udder.

i want my content borrowed/shared, and I still need to be engaged in this stuff because the poorly behaved distributed bots that have arisen in the past year are trying to take boundless resources from my site(s), that I cannot afford.
Then some of those small people are wrong too.

I wish we could all just stop fighting the truth of the tech -- it costs ZERO to make copies of things, and adjust accordingly.

Patreon (and keep it real, OnlyFans) are roughly the only viable long term models.

> The small-time independent publishers and creators also want to restrict the web because they don't want their content "stolen".

I'm sure some music creators may have, years ago, been against CD recorders, or platforms like Napster or even IRC-based file transfer for sharing music. Hell, maybe they were even against VCRs back in the day. But they were misguided at best.

People who want to prevent computer users from freely copying data are, in this context at least, part of "them" rather than "us".

Duh. I've known this for decades. The biggest advocates for DRM I've known are small-time content creators: authors, video producers, musicians. They've been saying the same thing since the 90s: without things like DRM, their stuff would be pirated, and they'd like to earn a living doing what they love instead of grinding at a day job to support themselves while everybody benefits from their creative output. In addition, major publishers and record labels won't touch stuff that's been online because of the piracy risk. They don't want to make an investment in smaller creators without a return in the form of sales of copies. That last bit is less true of music now than it used to be because of streaming and stuff, but the principle still applies.

This is why the DMCA will never be repealed, DRM will never go away, and there is no future for general purpose computing. People want access digital content, but the creators of that content wouldn't release it at all if they knew that it could be copied endlessly by whomever receives it.

That isn't entirely true. Perhaps it's because small content creators aren't a monolithic group. There are a few who try the alternative approaches and succeed. For example, whenever buying ebooks, I first check if the author sells it directly or through small publishers. It's always a better deal if they do. Cheaper than what you pay on amzn, DRM-free and occasionally lifetime free updates (eg: The Kubernetes book by Nigel Poulton). Despite the lower price, the author gets most, if not all of what you pay. They're sometimes liberal with the sharing policy too. They ask you to not share it around in large numbers, while conceding that just a copy or two is expected. I find this to be a reasonable demand. Therefore I encourage people to buy a copy for themselves if they like the book.

I have heard someone trying this approach with music albums and succeeding at it. The album is more likely to go viral due to the easiness in sharing, while you'll always find consumers who volunteer to pay you. While the returns per copy is low, the large number of copies means that your profits may be higher than if it were DRM-encumbered. Musicians may also like the fact that there are no powerful middlemen that they have to contend with. In fact, this is what YouTube creators already do when they choose alternative monetization paths like Patreon.

What's really needed is for people to support and encourage this model and such creators. We used to earlier blame them saying that people choose convenience and short term savings over long term market health. But that's no longer applicable. People are so fed up with being exploited under consumerism that they've started boycotting these big players to regain their independence and self sufficiency. The real issue preventing open digital markets is just the lack of awareness of their existence. This message has to be spread somehow.

https://en.wikipedia.org/wiki/Useful_idiot is the type of person that will speak against their and common good because someone told them it's bad.

Just look at the hordes of people advocating Brave, which is a series scam company project.

It's us vs them. What big corps want is fundamentally adversarial due to it's motivation. I like to think that humans can conceptually not be your enemy.
> The small-time independent publishers and creators also want to restrict the web because they don't want their content "stolen"

... or just keep their site on the Internet. There hasn't been any major progress on sanctioning bad actors - be it people running vulnerable IoT crap that ends up being taken over by a botnet, cybercriminals and bulletproof hosters, or nation state actors. As long as you don't attack targets from your own geopolitical class (i.e. Russians don't attack Russians, a lot of malware will just quit if it spots Russian locale), you can do whatever the fuck you want.

And that is how we end up with darknet services where you can trivially order a DDoS taking down a website you don't like or, if you manage to get your opponent's IP leaked during an online game, their residential IP address. Pay with whatever shitcoin you have, and no one is any wiser who the perpetrator is.

>The small-time independent publishers and creators also want to restrict the web

Oh really? Does Linus's Floatplane go to this extent to prevent users from downloading stuff? Does Nebula? Does whatever that gun youtuber's version of video site do this?

Does Patreon?

It’s like we are living in an affordability crisis and people are tired of 400 wealthy billionaires profiting from peoples largess in the form of free data/tooling.
When Nixon slammed the gold window shut so Congress could keep writing blank checks for Vietnam and the Great Society, it wasn't just some monetary technicality. It was the moment America broke its word to the world and broke something fundamental in us too. Suddenly money wasn't something you earned through sweat or innovation anymore. It became something politicians and bankers could conjure from thin air whenever they wanted another war, another corporate bailout, another vote-buying scheme.

Fast forward fifty years and smell the rot. That same fiscal recklessness Congress spending like drunken sailors while pretending deficits don't matter has bled into every pore of society. Why wouldn't it? When BlackRock scoops up entire neighborhoods with Fed-printed cash while your kid can't afford a studio apartment, people notice. When Tyson jacks up chicken prices to record profits while diners can't afford bacon, people feel it. And when some indie blogger slaps a paywall on their life's work because OpenAI vacuumed their words to train ChatGPT? That's the same disease wearing digital clothes.

We're all living in Nixon's hangover. The "us vs us" chaos you see Discord servers demanding your phone number, small sites gatekeeping against bots, everyone scrambling to monetize scraps that's what happens when trust evaporates. Just like the dollar became Monopoly money after '71, everything feels devalued now. Your labor? Worth less each year. Your creativity? Someone's AI training fuel. Your neighborhood? A BlackRock asset on a spreadsheet.

And Washington's still at it! Printing trillions to "save the economy" while inflation eats your paycheck alive. Passing trillion-dollar "infrastructure bills" that somehow leave bridges crumbling but defense contractors swimming in cash. It's the same old shell game: socialize the losses, privatize the gains. The factory worker paying $8 for eggs understands this. The nurse getting lectured about "wage spirals" while hospital CEOs pocket millions understands this. The teenager locking down their Discord because bots keep spamming scams? They understand this.

Weimar happened when money became meaningless. 1971 happened when promises became meaningless. What you're seeing now the suspicion, the barriers, the every-man-for-himself hustle is what bubbles up when people realize the whole system's running on fumes. The diner owner charging $18 for a burger isn't greedy. The blogger blocking AI scrapers isn't a Luddite. They're just building levees against a flood Washington started with a printing press half a century ago.

The tragedy is that we're all knee-deep in the same muddy water, throwing sandbags at each other while the real architects of this mess the political grifters, the Fed bankers, the extraction-engine capitalists watch dry-eyed from their high ground. Until we stop accepting their counterfeit money and their counterfeit promises, we'll keep drowning in this rigged game. The gold window didn't just close in '71. The whole damn social contract rusted shut.

Sir, this is a Wendy's.

The gold standard is objectively terrible economic policy and "society was better when I was young" has been a meme for thousands of years.

It feels nice to attribute everything bad to this one weird trick, but it's fake.

19th-century factory towns sucked dick. Child labor, no safety nets, cholera outbreaks. Yeah, not exactly Disneyland. But that's not the argument. The gold standard didn't cause those horrors. Unchecked capitalism and shitty labor laws did. What gold did do was force a brutal honesty. You couldn't fake prosperity. If you wanted a war or a welfare program, you either taxed people directly or found the gold to pay for it. No printing press to hide the pain. Nixon's 1971 move didn't just detach the dollar from gold. It detached political accountability from fiscal reality. Every fucking crisis since '71 gets "solved" the same way: print, borrow, and kick the can. Gold didn't prevent recessions; it prevented this: a system where elites privatize gains but socialize losses. Your grandpa's $0.25/hour wage in 1950 bought 10 loaves of bread. Your $15/hour today buys 3 loaves. Your 401k inflates, your rent explodes, and Washington shrugs: "Inflation's transitory, bro!"

Gold "terrible"? Tell that to the single mom paying 40% of her paycheck to rent a BlackRock-owned apartment. Why's BlackRock her landlord? Because fiat made debt cheaper than dirt. They borrowed billions at 0% from the Fed, bought entire neighborhoods, and jacked rents. Under gold? Interest rates would've spiked, crushing their leveraged bets. But nah. We got "QE Infinity" instead. Today's policy is literally cronyism with extra steps: print to bail out banks which causes inflated assets which squeezes workers. Rinse. Repeat.

What does any of this have to do with yt-dlp?
Ostensibly the same forces that drove Nixon to move the dollar off of gold, are driving Google to destroy third party YouTube clients.
Wow. That was eloquent, and coherent, and depressing. I'd be grateful for someone to counter with something less dismal. Good things are still happening in the world. A positive future remains possible -- but we have to be able to imagine it to bring it into being.
Semi coherent. The greed and corruption is a real theme but would still be 100% possible while on the gold standard.
They'd have to physically steal gold from people, and people would notice that. Or they could mine more gold, but that's hard. Or they could publicly and officially change the exchange rate (of dollars to gold), and people would notice that politicians make it go down, the same way that people notice when politicians make taxes go up (they notice way more than when prices other than taxes go up).

With the current system, they (the central bank) can just increase some people's numbers in some spreadsheets, and the effects are extremely indirect. Nominally this is in exchange for assets of equal value so the situation returns the normal after some time, but that hasn't been happening - the amount of money created this way has not been decreasing at any meaningful rate.

You're goddamn right corruption existed under gold. Nobody's claiming it was some purity paradise. But here's the difference: gold didn't enable systemic looting on a planetary scale. Gold didn't prevent sin. it prevented the industrialization of sin. The robber barons were sharks in a pond. Today's fiat-enabled oligarchs are gods reshaping reality. When you can print the fucking rules, accountability evaporates. That's why Nixon's 1971 decision wasn't just policy. It was an engraved invitation for the largest wealth transfer in human history.

Greed and corruption absolutely festered under the gold standard. Boss Tweed embezzled a fortune from New York City coffers, Vanderbilt strong-armed railroads, and Rockefeller crushed competitors with predatory pricing. But here's the huge distinction: gold acted like a leash on a rabid dog. It didn't kill the beast, but it kept it from devouring the whole goddamn village. When robber barons got too greedy in the 1800s, their schemes imploded under gold's brutal discipline. Jay Cooke's bank collapsed in 1873? No Fed stepped in with trillions in printed cash to resurrect his corpse. Markets purged the rot, losers ate shit, and the system reset in years. Not decades of zombie corporations propped up by cheap debt. Corruption back then was like a bar fight: bloody, ugly, but contained. Tweed stole existing gold coins. He couldn't order the Treasury to mint him a fresh fortune overnight.

Vanderbilt couldn't borrow billions at 0% interest from a central bank to buy every competitor; he had to convince investors with real profits, not financialized vapor. Fast-forward to today: fiat didn't invent greed. It weaponized it. LBJ funded Vietnam and the Great Society without raising taxes because the printer go brrr. BlackRock gobbles neighborhoods with Fed-subsidized debt while renters bleed. Tyson jacks food prices 20%, blames "inflation," and pockets record profits because fiat decouples prices from real value. Banks peddle toxic mortgages knowing the Fed will bail them out. Politicians pass $6T "stimulus" bills while your paycheck buys less bread than a 1950s factory worker's. That's the cancer Nixon unleashed in '71. Not corruption itself, but its metastasis into a globalized, systemic looting operation where elites privatize gains, socialize losses, and inflation becomes a tax on the powerless. Gold didn't stop crooks. It stopped crooks from becoming untouchable gods. The 1800s proved humans will always be greedy. Fiat just gave them the universe's credit card and told your grandkids to foot the bill.

Well on the bright side blood avocados are still green. Which the poster also seems to appreciate.
Lately I've had to resort to buying avocados from Costco in those little plastic cups because whole avocados in many supermarkets in my region have started to spoil too quickly. Sad.
until people learn money, the concept, nothing will change. and that in turn will hardly happen while the bad guys own childhood (compulsory schooling).
I don't know, it's really hard to blame them. In a way, the next couple of years are going to be a battle to balance easy access to info with compensation for content creators.

The web as we knew it before ChatGPT was built around the idea that humans have to scavenge for information, and while they're doing that, you can show them ads. In that world, content didn't need to be too protected because you were making up for it in eyeballs anyway.

With AI, that model is breaking down. We're seeing a shift towards bot traffic rather than human traffic, and information can be accessed far more effectively and, most importantly, without ad impressions. So, it makes total sense for them to be more protective about who has access to their content and to make sure people are actually paying for it, be it with ad views or some other form of agreement.

Don’t worry!

Ads are coming to AI. The big AI push next will be context, your context all the time. Your phone will “help” and get all your data to OpenAI…

“It looks like you went for a run today? Good job, you deserve a treat! Studies show a little ice cream after a long run is effectively free calories! It just so happens the nearest Dairy Queen is running a promotion just for the next 30 minutes. I’m getting you directions now.”

It would not be that much of a problem if ads promoted healthy and tasty food but they will probably promote an ice-cream made from a powder and chemicals emulating taste of berries rather than from milk and fresh-picked berries.
It still would be. Loss of agency. Ads are text and images you see. Native advertising in a chatbot conversation is a third party bidding their way into your conversation. Machine showing you an ad versus injecting intention into your context are very different things.
If open source AI becomes good enough would this model hold? I guess they will try to shut down the open models as they come close?
Depends on scaling.

But remember, the model is the engine.

Your ChatGPT, Claude’s, etc are products. They run on LLMs but also code and tools on the backend.

Run a local model all you want, it’ll never make a fillable PDF for you or remember your context on its own.

This is why contra Louis Rossman, Clippy was not a good thing for humanity.
"I'm calling the user analysis tool... it seems this user is health conscious. I'll suggest a trail app for their next run instead of ice cream."
I think your point is valid, but FTR the "shift" happened long before ChatGPT; bot traffic has exceeded that of humans for over a decade.
Weird people talking about small time creators wanting DRM I've never seen that... Usually they'd be hounding for any attention? I don't know why multiple accounts are seemingly independently bringing this up, but maybe it is trying to muddy the waters? This concept?
At least for YouTube, viewbotting is very much a thing, which undermines trust in the platform. Even if we were to remove Google ads from the equation, there’s nothing preventing someone from crafting a channel with millions of bot-generated views and comments, in order to paid sponsor placements, etc.

The reasons are similar for Cloudflare, but their stances are a bit too DRMish for my tastes. I guess someone could draw the lines differently.

If any of this was done to combat viewbotting, then any disruption to token calculation would prevent views from being registered - not videos from being downloaded.
From my perspective both problems are effectively the same. I want to count unique users by checking for asset downloads and correlating unique session IDs. People can request the static assets directly, leading to view booting and waste of egress bandwidth.

The solution: have clients prove they are a legitimate client by running some computationally intensive JS that interacts with DOM APIs, etc. (which is not in any way unique to big tech, see Anubis/CreepJS etc.)

The impact on the hobbyist use case is, to them, just collateral damage.

No, the difference is: if I'm fighting viewbots, I want zero cues to be emitted to the client. The client should NEVER know whether its view is being counted or not, or why.

Having no reliable feedback makes it so much harder for a viewbotter to find a workaround.

If there's a visible block on video downloads? They're not fighting viewbots with that.

For general spam deterrence I agree, but how do you prevent paying for the bandwidth in this case?
Youtube has already accounted for this by using a separate endpoint to count watch stats. See the recent articles about view counts being down attributed to people using adblockers.

Even if they hadn't done that, you can craft millions of bot-sponsored views using a legitimate browser and some automation and the current update doesn't change that.

So I'd say Occam's razor applies and Youtube simply wants to be in control of how people view their videos so they can serve ads, show additional content nearby to keep them on the platform longer, track what parts of the video are most watched, and so on.

I'm sure that's a problem for Youtube. What does it have to do with me rendering Youtube videos on my own computer in the way I want?
> What does it have to do with me rendering Youtube videos on my own computer in the way I want?

It doesn't. That interferes with google's ad revenue stream, which is why YT continues to try to make it harder and harder to do so.

You don't have that right. When you view copyrighted content, you do so at the pleasure of the licensor.
How you watch copyrighted content has never been something that copyright has controlled.
If the content needs to be copied or downloaded in order to be watched, you may do so exclusively under terms set by the licensor, period. You may not even get fair use rights, as to get the content in the first place you might have to agree to terms of service waiving them, and being found to use the content in an unapproved way would be grounds for cutting off your access.
Like another comment mentioned: that's a problem for YouTube to solve.

They pay a lot of money to many smart people who can implement sophisticated bot detection systems, without impacting most legitimate human users. But when their business model depends on extracting value from their users' data, tracking their behavior and profiling them across their services so that they can better serve them ads, it goes against their bottom line for anyone to access their service via any other interface than their official ones.

This is what these changes are primarily about. Preventing abuse is just a side benefit they can use as an excuse.

As a viewer, this is not even remotely my problem.
> which undermines trust in the platform

What? What does this even mean? Who "trusts" youtube? It's filled with disinformation, AI slop and nonsense.

I provided an example is given right after that sentence. Trustworthiness of the content is an entirely separate thing.
you forgot the excessive censorship, of course to "fight disinformation"...

it even became an interesting signal which "disinformation" they deem censorship-worthy.

The fact you shoved Cloudflare in there shows your ignorance of the actual problems and solutions offered.
There could be valid reasons for fighting downloaders, for example:

- AI companies scraping YT without paying YT let alone creators for training data. Imagine how many data YT has.

- YT competitors in other countries scraping YT to copy videos, especially in countries where YT is blocked. Some such companies have a function "move all my videos from YT" to promote bloggers migration.

>AI companies

Like Google?

>scraping YT without paying YT let alone creators for training data

Like Google has been doing to the entire internet, including people’s movement, conversations, and habits… for decades?

> Like Google?

Like Google competitors obviously.

> Like Google has been doing to the entire internet, including people’s movement, conversations, and habits… for decades?

Yes, but if you allowed to index your site (companies even spent money to make site better indexable), Google used to bring customers and AI companies bring back nothing. They are just freeloaders.

- Enforce views of ads

(not debating the validity of this reason, but this is the entire reason Youtube exists, to sell and push ads)

Then they should allow a download API for paying customers.
But even if you’re a paying customer, the creator is only paid if you watch it on the platform.
Music labels publish the music on YT in exchange for ad revenue, they won't be happy if someone would download their music for free, and making music is expensive, google how much just a single drum mic costs and you need lot of them.
> for paying customers
YT shares income from subscriptions with music labels? I didn't hear about this, and even if they shared the download must be paid much higher than a view because after downloading a person could potentially listen for a track hundred times in a row.
It's not YT's content though.
Who says these are valid?
Why is this being downvoted? Are people really gonna shoot the messenger and fail to why a company may be willing to protect their competitive position?
Everything trends towards centralization on a long enough period.

I laugh at people who think ActivityPub or Mastodon or BlueSky will save us. We already had that, it was called e-mail, look what happened once everyone started using it.

If we couldn't stop the centralization effects that occurred on e-mail, any attempt to stop centralization in general is honestly a utopian fool's errand. Regulation is easier.

I am a big supporter of AT Protocol, and I contribute some money to a fund to build on it. Why laugh at running experiments? Nothing will "save us," it is a constant effort as long as humans desire to use these systems to connect. Email exists today, and is very usable still as a platform that cannot be captured. The consolidation occurred because people do not want to run their own servers, so we should build for that! Bluesky and AT Protocol are experiments in building something different, with different use cases and capabilities, that also cannot be captured. Just like email. You can run your own PDS. You can run your own stack from PDS to users "end to end" if you so choose. You can pay to do both of these tasks. No one can buy this or take it away from you, if it is built on protocols instead of a platform someone can own and control.

Regulation would be great. The EU does it well. It is lacking in the US, and will be for some time. And so we have to downgrade to technical mitigations against centralization until regulation can meet the burden.

e-mail can't handle 24/7 1k posts/sec traffic which Twitter was about. A more appropriate analogue is IRC.
And barely a few days after google did it the fix is in.

Amazing how they simply couldn't win - you deliver content to client, the content goes to the client. Could be the largest corporation of the world and we still have yt-dlp.

That's why all of them wanted proprietary walled gardens where they would be able to control the client too - so you get to watch the ads or pay up.

> For the web it requires that you run a snippet of javascript code (the challenge) in the browser to prove that you are not a bot.

How does this prove you are not a bot. How does this code not work in a headless Chromimum if it's just client side JS?

Good question! Indeed you can run the challenge code using headless Chromium and it will function [1]. They are constantly updating the challenge however, and may add additional checks in the future. I suppose Google wants to make it more expensive overall to scrape Youtube to deter the most egregious bots.

[1] https://github.com/LuanRT/BgUtils

LLMs solve challenges. Can we not solve these challenges with sufficiently advanced LLMs? Gemini even, if you're feeling lulz-y.
Yes, by spending money.
I agree, in some cases and depending on LLM endpoint, some money may need to be spent to enable ripping. But is it cheaper than paying Youtube/Google? That is the question.
sometimes, it's not about the cost. it's about who/where the money is being spent.
Once JavaScript is running, it can perform complex fingerprinting operations that are difficult to circumvent effectively.

I have a little experience with Selenium headless on Facebook. Facebook tests fonts, SVG rendering, CSS support, screen resolution, clock and geographical settings, and hundreds of other things that give it a very good idea of whether it's a normal client or Selenium headless. Since it picks a certain number of checks more or less at random and they can modify the JS each time it loads, it is very, very complicated to simulate.

Facebook and Instagram know this and allow it below a certain limit because it is more about bot protection than content protection.

This is the case when you have a real web browser running in the background. Here we are talking about standalone software written in Python.

How does testing rendering work? Can javascript get pixel data from the DOM
So the way this works is to draw fonts/svgs inside the canvas and check the pixels, that makes sense
This is just one element among many others. They probably have many available and others in reserve in case one becomes obsolete.

I recently discovered that audio codecs, frequencies, resolution, mix volume, etc. are accessible via JS in the browser and that this allows fingerprinting. Since we are talking about YouTube, the same type of technique should be possible with video codecs.

why can a bot dev not just get all of these values from the laptop's settings and hardwire the headless version to have the same values?
Because the expected values are not fixed, it is possible to measure response times and errors to check whether something is in the cache or not, etc.

There are a whole host of tricks relating to rendering and positioning at the edge of the display window and canvas rather than the window, which allow you to detect execution without rendering.

To simulate all this correctly, you end up with a standard browser, standard execution times, full rendering in the background, etc. No one wants to download their YouTube video at 1x speed and wait for the adverts to finish.