Hacker News new | ask | show | jobs
by eugeniub 1753 days ago
Maintainers: 2 months of inactivity on a FOSS project

Redditors: Yep, this is the end.

13 comments

That's not an accurate assessment of what's going on. The post linked states "no contact with the maintainers" not just inactivity, and if you follow the github issues linked, it's about people wondering what's going on and if anyone else can approve pull requests because there's lots of pull requests waiting. There's 843 pull requests at this time, and I just looked and over 50 are from just the last month.

It's not that there's repo inactivity, it seems to be that this is an extremely active repo which saw everything grind to a halt when the admins went dark. That's quite a bit different then just "inactivity".

> There's 843 pull requests at this time, and I just looked and over 50 are from just the last month

That's kinda overwhelming though ... imagine that if the maintainer pops up somewhere, suddenly 100 motivated people may chime in "hey please review this important pull request that's been sitting over here for a while".

There are some kinds of open source projects that are prone to this ... some are really not so bad to maintain if you have the right kind of discipline, because they converge on a stable set of functionality and platform compatibility evolves slowly, but some just naturally have endless room for variations and special cases, and as users increase, PRs increase linearly (instead of sub-linearly as you'd hope). I'm thinking in particular of https://github.com/oauth2-proxy/oauth2-proxy (of which I contributed to an older fork)

youtube-dl relies on up-to-date definitions of sites in order to download the content.

When a webpage changes layout, youtube-dl needs updated as well.

We're talking mostly about a list of site definitions more than we are core development.

Months before the lawsuit, youtube-dl's maintainers frequently closed issues that reported ongoing breakage without giving a reason. Here is one especially illuminating example:

https://github.com/ytdl-org/youtube-dl/issues/23860

And there is also an entire fork that fixes the support for just a single provider, NicoNico, because the maintainers ignored its issues.

https://github.com/animelover1984/youtube-dl

A quote from its README:

All code in this project is licensed solely with the condition that any portion of it is not permitted to be used in the main youtube-dl fork, either directly or indirectly. It is also not permitted to be used in any project that contains contributions from either remitamine or dstftw.

The two users mentioned are or were previously major contributors to youtube-dl.

It seems that youtube-dl was already a dysfunctionally managed project at the time of the lawsuit and happened to ride out on the good PR for a couple of months, before returning to stagnation once again.

To me it sounds like a plugin system would have prevented centralization and the need for forks, but would have made distribution harder for average users.

Indeed, this has been years in the making. Maintainer activity has been slowly dwindling while would-be contributors were driven away by the maintainers’ lack of communication and abrasiveness. I myself have had pull requests languishing there for years with nobody bothering to review them. Other people had their issues closed with no explanation. It was just a matter of time. Good thing that the forks have sprung up some time before upstream development halted entirely.
2 months is a long time in youtube-dl world. It's not really a "software project" in the traditional sense, where you can stop working on it once it's "done". It's more of a "social project", a focal point for the required ongoing activity necessary to keep sites working. Youtube-dl without daily commits is useless.
I'm currently using it to download a youtube video. If it still works for it's main function, maybe it's just not a high priority project until something breaks?
It's actually already broken. If you try to download more than 3-16 videos (the limit is not clear), you start to be rate limited to 300 kbps or so. According to Reddit this is fixed in a fork called yt-dlp
To be honest that sounds fine. You're presumably downloading it to watch it offline, the download speed isn't really material as long as it finishes eventually
youtube-dl is also used by video players like mpv for live streaming. In this case, 70 kbps is completely unusable for the vast majority of videos.
One other workaround is to use '-F' to get a list of formats, and choose one of the non-default formats to download. This seems to work for me.
Primary function, perhaps, but it's used for lots of websites that aren't YouTube, and some of those websites have broken.

There are open pull requests and/or bugs for many websites that aren't being approved. This is rather unusual for youtube-dl, it normally had a release ever two weeks or so.

It may say “YouTube” in the name, but it’s a tool designed to work with many sites.
For a project like youtube-dl it is a long time, because they use unofficial APIs (fancy word for scraping) of video sites that can shift even on daily basis. If you look at their Github issues it is just people endlessly complaining that some websites are broken again
> unofficial APIs (fancy word for scraping)

Using a non-public API is not at all the same as scraping, which refers to parsing a rendered HTML page for the content you want.

Both have this maintenance problem, but one's not a fancy word for the other.

That used to be true, but today, with so many websites operating as SPAs against undocumented APIs, I think it's reasonable to redefine "scraping" to mean extracting data from unofficial APIs in addition to extracting it by parsing HTML.

After all, what is a scrapeable HTML page if not a grotesquely convoluted undocumented API with an unstable output format?

Scraping refers specifically to extracting data from a format designed to be read by humans instead of machines.

The gross inefficiency and low data-to-layout ratio are the key things being expressed through connotations of the word "scrape". To scrape is to extract a small amount of something from a much larger substrate.

To call every query a scrape is to diminish the specificity and utility of the term.

If an unofficial API returns JSON that looks like this:

    {
      "id": 3422,
      "title": "My essay about cheese",
      "published": "13th August 2021 at 3:45pm",
      "abstract": "<p>In which I write about cheese!</p>"
    }
And I write code against that which includes stripping the HTML tags from "abstract" and converting the date format in "published" into in ISO datetime... am I writing scraping code?

I would argue that I am, even though it started out as a JSON wrapper.

"To call every query a scrape is to diminish the specificity and utility of the term."

Absolutely disagree with you there. I interpret the term "scraping" as "writing code that gathers data from a source that has not deliberately published that data in a usable format". Gathering data from any kind of API fits that criteria for me, since most APIs only give you a subset of the data at a time.

I think the reason I care so much about this is that I coined the term "git scraping" to cover a variant of scraping that uses Git repositories to store the data and track changes over time - and git scraping applies equally to data sourced from APIs as it does to data sourced from HTML pages. https://simonwillison.net/2020/Oct/9/git-scraping/

If everyone insists that is what it means for long enough, then that is what it will mean.

The term was coined to differentiate how difficult it is to extract data from a format that was patently not intended to efficiently spread raw data to other machines. If that meaning erodes, and it's just yet another way to say an API query, it will be a great loss for the precision of our terminology.

So you get a video feed in the end, that is viewed by human presumably?
What you do with the data after you have it is not a qualification for whether or not you got it by scraping.
Sounds like an exhausting thing to maintain. It's not like writing scraping (or even just changing slight variations in an API) is terribly interesting.
True, but it’s also the kind of product that’s instantly useful and itch-scratchy. Youtube-dl not working on the video you’re downloading today? Well if you’re a maintainer you can just patch it yourself. (Non-maintainers can too of course, but I imagine the maintainers have the know-how to actually fix things)
The amount youtube and the other sites it supports changes to subvert the use of this very tool, I can see why people are wondering what's happening
Having contributed to youtube-dl in the past, long turnaround times from the maintainers was pretty normal. I've had (and still have some) open PRs that have been ready to merge for going on a year. The two months is really not that big of deal.

That being said, the project probably could use some reorganization. It requires a lot of community contributions to keep all the extractors maintained so long turnaround times for reviews isn't ideal.

> Considering there's been weekly releases for years, two months is odd and worry some.

https://old.reddit.com/r/DataHoarder/comments/p9riey/youtube...

Sounds more like maybe the person is sick or something.

As other mentioned, it’s different with YouTube, as the site changes constantly
Also, in common with other massive online properties for instance Amazon, not consistently: changes sometimes roll out a bit at a time so users in different areas get different versions, either due to global roll-out being a staged process or because a UI experiment is being performed. It usually doesn't matter to a well written scraper as the core data is still accessible in the same way despite the UI sugar coat having changed, but sometimes there are significant enough changes under the hood too that the scraper needs to deal with while still supporting the older format(s).
Exactly. It is ok to compete but announcing your competitor project is dead to promote your own, it is a little sketchy to say the least.
I guess it depends on if sub-2-month feature development is needed to keep up with YouTube's changes or not.

I maintain a GCC code coverage tool on GitHub, and since GCC doesn't change very often and the feature set of the tool is fairly complete, I sometimes go 6+ months without commits. Usually I don't touch it unless someone opens an issue.

I mean, just look at <https://github.com/ytdl-org/youtube-dl/issues/29809>.

They don’t even bother removing spam.

Seriously. I've seen core/lead maintainers on projects I worked on go awol for months and years at a time, and come back guns blazing.
dark thought but I've had multiple instances in last year+ where someone hasn't posted in awhile, tweeted etc and I wonder for a second, maybe they died? The pandemic is real. And random ppl disappearing unexpectedly is part of it. I hope all is well.
It's August. Could the maintainers be on holiday?