Hacker News new | ask | show | jobs
by ralferoo 658 days ago
It's very clear from that thread that he doesn't understand the purpose of the stable branch. It doesn't mean "stable" as in "the best possible experience", it means it as in "this code has been tested for a long period of time with no serious defects found" so that when the stable branch is promoted to release, everything has undergone a long testing period by a broad user base.

If there is a defect found, the change to a stable branch should literally be the minimal code change to fix the reported issue. Ideally, if it's a newly introduced issue (i.e. since being on the stable branch), the problematic code reverted and a different fix to the original defect applied instead (or left if it's deemed less of an issue than taking another speculative fix). Anything that requires a re-organisation of code, by definition, isn't a minimal fix. Maybe it's the correct long-term solution, but that can be done on the unstable branch, but for the stable branch, the best fix is the simplest work around. If there isn't a simple work around, the best fix is to revert everything back to the previous stable version and keep iterating on the unstable branch.

The guy even admits it as well with his repeated "please don't actually use this in production" style messages - it's hard to give a greater indication than this that the code isn't yet ready for stable.

I can understand why from his perspective he wants his changes in the hands of users as soon as possible - it's something he's poured his heart and soul and he strongly believes it will improve his users' experience. It's also the case that he is happy running the very latest and probably has more confidence in it that an older version. The rational choice from his perspective is to always use the latest code. But, discounting the extremely unlikely situation that his code is entirely bug free, that just means he hasn't yet found the next serious bug. If a big code change is rushed out into the stable branch, it just increases the likelihood that any serious bug won't have the time it needs in testing to have the confidence that's the branch is suitable for promotion to release.

4 comments

> The guy even admits it as well with his repeated "please don't actually use this in production" style messages - it's hard to give a greater indication than this that the code isn't yet ready for stable.

True that, and yet the kernel has zero issues keeping Btrfs around even though it's been eating people data since 2010. Kent Overstreet sure is naive at times, but I just can't not sneer at the irony that an experimental filesystem is arguably better than a 15-years old one that's been in the Linux kernel for more than a decade.

> True that, and yet the kernel has zero issues keeping Btrfs around even though it's been eating people data since

I can imagine scenarios where known failure modes on an "inferior" tool are better than unknown failure modes on a "superior" one.

honestly, it's mostly just a matter of trying to give myself the time to work with bugs as people hit them, and stage the rollout. I don't want clueless newbies running it until it's completely bulletproof.
It seems to be a difficult situation: he has bug fixes against the version in the stable kernel for bugs which haven't been reported. I can see both perspectives: on stable you don't want to do development, but also you want all bugfixes you can get. I can also see the point of Linus, who wants just to add bug fixes and to minimize the risk of introducing new bugs.

Considering that Kent himself warns against general use right now, I don't quite see the urgency to get the bug fixes out - in my understanding Linus would happily merge them in the next development kernel. And whoever is set to to run bcachefs right now, might also be happy to run a dev kernel.

I agree. If the author himself is telling people it's not ready for production, it doesn't really matter what bugs this code has, unless it affects other subsystems or is a dangerous regression from the previous stable release.

If the bug it's fixing was already in the current release branch, wasn't noticed before and has only shown up now late in the stable branch lifetime, then it definitely doesn't seem like something that needs an urgent fix.

He is not submitting changes for stable. He is submitting non-regression fixes after the merge window. It's clear he understands the rules and the reasons for them but feels like his own internal development process is equivalent at reducing the chance of major regressions introduced in such a PR such that he can simply persuade Linus to let things go through anyway.

Whether this internal process gives him a pass for getting his non-regression fixes in after the merge window is at the end of the day for Linus to decide. And Linus is finally erring on the side of "Please just do what everyone else is doing" rather than "Okay, fine Kent, just this once".

I would say it's ironic to start a comment saying: "It's very clear from that thread that he doesn't understand the purpose of the stable branch" when it's "very clear" from your opening paragraph that you don't understand the thread.

Perhaps you might enlighten me how it's "very clear" from my opening paragraph that I don't understand the thread. Granted, the initial post could be interpreted a number of different says, but having read the whole thread, I think I have a pretty good understanding of the intent. But clearly, you have a different interpretation, so please - enlighten me to your way of thinking.

Taken at it's most charitable, the opening of the first message "no more bug reports related to the disk accounting rewrite, things are looking good over here as far as regressions go" would suggest a meaning of "there are no significant bugs, the changes below are optional".

The next section in the change description then says that this fixes a number of very serious bugs. Straight away, I can see the potential for an interpretation difference. Is it "heads up, no changes required" or "these fixes are critical"?

He's told "no" by Linus, for reasons that seem to correlate with what I said (unless you'd like to point out in what way I don't understand the thread), and then rather than saying "yeah, then can wait until the next stable branch", he doubled down on the importance of getting these changes in and basically saying that the rules should only apply to everyone else and not him because he knows that there won't be any new bugs because of $REASONS. $REASONS that didn't apply when the bugs were introduced. $REASONS that include automated testing, but that didn't find these bugs originally.

The thread (which apparently I don't understand) contains a perfect summary from Linus himself: "But it doesn't even change the issue: you aren't fixing a regression, you are doing new development to fix some old probl;em, and now you are literally editing non-bcachefs files too."

All this for some changes to a system that he's actively discouraging people from using because it's not production ready anyway, and so none of these bug fixes are actually critical for right now.

It's good he ultimately backs down, but he should never have been pushing for these changes this late in the stable branch timeline anyway.

So, that's my understanding of the thread. I'd be interested to hear how your understanding of the thread is so radically different from that.

> enlighten me

Fundamentally on the whole I don't think most of your interpretation is comment worthy. (To clarify, I don't think its particularly objectionable following from the premise in your opening paragraph.) But...

> in the stable branch timeline

Again. Like I outlined in my initial reply. This has nothing to do with stable. I don't know why you keep talking about stable.

The discussion is about bleeding edge mainline Linux. It's clear to me because:

* It is a PR for Linus. I don't know enough about stable to know if they use PRs but most stable stuff I do know about involves marking patches for stable on the specific mailing lists oriented around stable.

* Linus doesn't handle stable.

* Linus and Kent are taking about the merge window and Kent submitting non-regression fixing patches after it. This doesn't make any sense if it was in a stable context. The process is different.

* If this was stable the discussion would be with GKH.

So, your comment is based on the premise of this being a discussion surrounding stable. It's not, so I don't know what to make of the rest of your comment on the basis of this incorrect premise.

> So, your comment is based on the premise of this being a discussion surrounding stable. It's not, so I don't know what to make of the rest of your comment on the basis of this incorrect premise.

The repo is literally called "linux-stable-rc"

It wasn't an incorrect premise, just incorrect terminology. Sorry, my bad, I shouldn't have referred to it prematurely as "stable" when it is just undergoing the process of stabilization.

> > in the stable branch timeline

> This has nothing to do with stable. I don't know why you keep talking about stable.

Yes, technically, this isn't officially called "stable" until after the last release candidate, however every release candidate should be considered as an attempt to create the stable release (although pragmatically, nobody expects the first few to have had enough testing to surface all the bugs that are likely to show up) and I don't think it's particularly egregious to talk about this change in the context of the stable branch timeline as -rc releases are just as much part of the timeline as the initial stable release and the later point releases.

For context, this change was being requested for inclusion in -rc6, which was over 4 weeks after the merge window ended. This very well could end up being promoted to the stable release if no more significant bugs are found. There is no way a change of this complexity should have been accepted, and when Linus pointed that out, Kent shouldn't have been arguing about it at all, instead he should have just waited to get it merged into 6.12 as he originally intended.

> The discussion is about bleeding edge mainline Linux.

Yes, it's mainline, but also "bleeding edge" is kind of a misnomer, as it hadn't been accepting feature changes and was in stabilization for producing stable release candidates for a month already, and by that point would have had significant testing.

Sorry for causing confusion my referring to it prematurely as "stable". I don't look at the kernel all that often, and we use a different process with different terminology in our environment. We keep mainline open all the time for ongoing feature work, fork that to "stable" which only accepts bug-fixes and from that we periodically create release candidates which get released for testing and possibly get relabeled as the actual release. Sorry I was still thinking in that mindset when I replied and didn't properly map the concepts back to those used in the kernel.

> The repo is literally called "linux-stable-rc"

It's not? What repo? The only two repos which are involved are https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin... which is mainline (implicitly) which is where these patches would land and git://evilpiepirate.org/bcachefs.git which is the source repo for the PR. The only branches being referenced are "master" (implicitly) for mainline and the tag "bcachefs-2024-08-23".

Regardless, to respond to the rest of your comment:

The reasons for why Linus is rejecting the change have nothing to do with the stable process and everything to do with the set release process. The mainline merge window opens, you (not you specifically unless you are a subsystem maintainer, if you want to contribute a patch as a non-maintainer, the process is completely separate and goes via the subsystem maintainers) submit features and bug fixes, the merge window closes, somewhere in the ballpark of 7 release candidates happen, and it's released as mainline. The goal of the RCs is to incorporate subsequent waves of fixes for any regressions introduced specifically by the bug fixes and new features.

Kent is claiming that, because he himself implements effectively an equivalently rigorous (according to him) feature testing and stabilisation process that his patches which do not fix regressions introduced by previous patches submitted during the merge window, but which do fix some real bugs, should be accepted outside the merge window.

In the past, Linus has let it slide, and he has also let it slide this time too: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin... . Linus is just asking Kent to stop doing this as he doesn't want to keep giving him special treatment.

I thought stable means "doesn't change"?
Not in the kernel land. Stable branches feature tens of thousands of patches.
Patches are expected but the kernel interfaces shouldn't change right? Like if I write a kernel module no patch should break my compatibility and make my module not build anymore (I think)? I don't care if it changes underneath as long as it doesn't change where I interface.
Userspace doesn't break, but if you don't want your module to break, upstream it (which is an important lesson about hardware selection: if it's not upstream and not being upstreamed, then you're going to get stuck on an old kernel at some point).

ZFS has broken on new releases (I don't recall if they were stable, I think they were), and that is one reason I won't use as the main filesystem on linux.

Usually. There are no hard rules though.

Upstream stable kernel certainly does not care about compatibility with your particular thirdparty module. You'll just have to add another KERNEL_VERSION #if. Maybe if you're nvidia, or something, things are different.