Hacker News new | ask | show | jobs
by tmoertel 4749 days ago
The packagers actually do know what’s best. What they do makes patches flow faster not only downstream but also upstream. Improvements and fixes get to more people and get to them faster.

Unbundling upstream libraries from downstream projects flattens the change-flow network, reducing the time it takes for things to get fixed and for the fixes to propagate. For example, say that project P uses library L and bundles a slightly modified L in its release. Whenever L’s developers fix or improve or security-patch L, P’s users don’t get the new code. They have to wait for P’s developers to get around to pulling the new code from L, applying their own modifications, and re-releasing P.

Packagers say that’s crazy. They ask: Why does P need a modified L? Is it to add fixes or new features? If so, let’s get them into L proper so that L proper will not only meet P’s needs but also provide those fixes and new features to everyone else. Is it because P’s version of L is no longer L but in name? Then let’s stop calling it L and confusing everybody. Fold the no-longer-L into P or release it as a fork of L called M that can have a life of its own.

The point is that keeping L out of P makes two things to happen: (1) It ensures that when L’s developers improve L, all users, including P’s downstream users, get those improvements right away. (2) It ensures that when P’s developers improve L, those improvements flow upstream to L quickly and reach all of L’s users, too.

More improvements, to more people, faster. That's the idea.

5 comments

I don't want to speak with any authority on this subject since it's mostly foreign to me. However, I will say that your explanation comes with a pretty big assumption: that a library sits on some 2-dimensional spectrum from bad to good. If you don't subscribe to this notion, libraries don't improve; they simply change. If you accept this, then you begin to see why arbitrarily changing parts of a software package without even trying to understand the consequences of those changes is madness.

I'm not saying I disagree with you, just trying to point out a spot you might have overlooked.

Ok, I definitely have a bias toward what the grandparent is saying: flatten the hierarchy.

But just from a logical standpoint, couldn't you apply the following equally to the Riak guy who is complaining: "arbitrarily changing parts of a software package without even trying to understand the consequences of those changes is madness"

It absolutely is madness. If the Riak guys want to use leveldb in a way Google won't support, they should rally with the package managers and get Google to stop being "pretend open source." (Hint, Google: just releasing the source doesn't work if you ignore all bug reports and patches from outside.)

I suspect the real issue here is too much "Not Invented Here" syndrome by all parties involved.

That's silly. Leveldb is totally open source in every sense of the word. Just because you want to customize it in a way they didn't originally customize it doesn't mean they are "pretending" open source. The fact that they are allowed to customize it in their own custom fork is largely because google is not "pretending".

There are plenty of reasons to ignore patches from outside that are completely valid. Google gets to decide the direction of their fork of leveldb. If a patch doesn't fit that direction they are under no obligation to accept it.

It's not madness for Riak to want a divergent version of a package. Nor is it madness for the package maintainer not to desire to take that package in the direction that Riak wants to. This is why there we have forks in the first place and it's perfectly fine.

In short. No it doesn't equally apply to the Riak guy. The package is responsible for cutting boundaries in the proper place if they don't want to do the work investingating that then they shouldn't package it.

> If the Riak guys want to use leveldb in a way Google won't support, they should rally with the package managers and get Google to stop being "pretend open source."

They don't have any control over Google, by "rallying" or otherwise.

Consider the recent case of the WebKit/blink split. Here you have two sets of some of the world's smartest engineers, who cannot agree about how to render a webpage! And this is a well-defined problem. There are actual standards about how to render webpages! In theory, everybody agrees about what's going on here, and yet it's fork time.

As for what Library X does, there are no standards, and not necessarily does anybody agree about what they are building. And let's be honest here, you probably do not have WebKit-caliber developers hacking on Library X. So the chance that you can arrive at consensus for Library X is much lower than for WebKit/blink.

Meanwhile, instead of dicking around with the will-they-wont-they-merge-upstream committee, you can just ship software that works in practice to people that want to use it. If you are a software developer, and you have the choice between writing software and arguing about it, it is usually a good bet on average to write software.

Well put, and this particular problem can easily be handled by having a prominent doc section called "Information for packagers" that outlines all this stuff. This isn't a new problem and seems to be best handled by engaging with the packagers and putting a small amount of effort into helping them, it's easy and pays enormous dividends.
The "information for packagers" doc section is often referred to as a Makefile.

It enumerates minimum versions of shared libraries, as well as explicit versions of static libraries.

The problem isn't with minimum versions, the problem is with maximum versions, which are potentially unknown at the time a project is released, i.e. the code works with the latest real ease of a library, but a future release of the library may break the project.

The only real way to avoid that is for a project to include a bunch of compliance tests, that validate that an underlying library correctly performs the operations the project needs it to do. But this is actually a lot of work, so in reality it will almost never be done. Which leads us back to the discussion about the wisdom of packagers changing libs without understanding the on downstream projects.

They really don't.

This is part of the reason for the plethora of Linux distributions. Some deployments can afford the rapid pace (and consequent instability) of the short term Ubuntu releases or Fedora. Other deployments really do require the longer term stability of the more methodical Ubuntu LTS releases or CentOS / RHEL.

Any improvement requires change, but not all changes are an improvement.

I'm not sure I was able to get my point across to you. Let me try another approach.

The improvement I'm talking about occurs upsteam of the distributions, even though it is caused by the distributions' packaging policies.

Libraries are upstream from projects, and projects are upstream from distributions. If the distributions discourage projects from bundling libraries, this policy will encourage project developers to talk to the upstream library developers to get desired changes into the libraries, rather than go the customize-and-bundle route. This improved coordination and patch-flow benefits the users of the libraries and the users of the projects, regardless of whether those users rely on any particular distribution to get the software. Users are, as always, still free to pick whatever distribution best suits their preferences, or no distribution at all. Still, they benefit from the distributions' debundling policy.

I might be able to comment better if the term "projects" were better defined. It just depends on, well, the nature of the dependencies. Most libraries depend on other libraries. Often at least three layers deep.

If you could explain how your philosophy would deal with, for example, nginx and Apache both depending on libssl, which itself depends on libcrypto, which depends on libz and libc (both of which are also separate independent dependencies of Apache and nginx) then maybe we could discuss it better.

Oh, and in theory I should be able to swap libssl for libgnutls arbitrarily. How do we handle that?

You're conflating the dependency graph with the change-flow network. The first represents how projects rely on other projects; the second represents how changes must propagate to reach all users. Once you understand the difference, you'll understand why debundling is the sensible response to the sea of large-scale interdependent software-development projects that characterizes most FOSS ecosystems.
If I understand you correctly, you're saying the dependency graph and the "change-flow network" are completely orthogonal.

Separation of concerns is a value of good software projects. But there are practical realities that the author of the article enumerates specifically.

If there is a tight coupling between his application and a handful of upstream libraries, packagers are far more likely to break his application by distributing the latest version of that shared library. Other applications that aren't as tightly coupled can handle that upgrade. Since it is tightly coupled, he's going to be highly attuned to the upgrade needs for his specific statically compiled version.

They're not orthogonal; they're two directed graphs with the same vertices and different edges.

If the dependency graph G = (V, E) has a vertex for every software project and an edge x -> y iff downstream project y depends on upstream project x, then the change-flow network is the graph C = (V, F), where there is an edge x -> y in F iff there is a downstream path between x and y in G and also y requires an update and re-release when x changes (e.g., because it bundles a copy of x in its releases).

So if there is a change to project x, for it to flow to all affected dependents, you must update all downstream neighbors of x in the change-flow network C.

For example, consider the following dependency graph, in which library L is used by downstream library L2, and L2 by project P:

    L -> L2 -> P
If none of the projects bundle their upstream dependencies in their own releases, then the corresponding change-flow network has no edges, and updating any project requires only re-releasing its own package to satisfy all dependencies:

    L
    L2
    P
But if L2 bundles a copy of L, and P bundles a copy of L2, then the corresponding network looks like this:

    L -> L2
    L -> P

    L2 -> P

    P
A change to L requires re-releasing not only L but also L2, and P. A change to L2 requires re-releasing L2 and also P.

Does that make more sense now?

Yeah, I love how the article is so myopic, they can't imagine a world in which they might be using a package that some other package is also using, therefore, it might need to be upgraded separately from their package. So the author has worked on two large projects that have dependencies, yet he thinks he has the experience to say that splitting a package up (say, into docs, libs and executables) is a bad thing? How many embedded devices has he administered? Or clusters? Or simple networks where things are setup to have NFS mounts across machines, and it's obvious that while you can install the docs on the NFS-doc-server once, you may need to have separate binary and library installs for each architecture/OS on the NFS-binary-servers. There's a reason sysadmins love well packaged software.
Sysadmin here.

Authors of well packaged software know when they need fine grained library features, and include static versions of that library. Authors of well packaged software also pay attention to the distribution of commonly used libraries, and make careful decisions when using system-provided shared versions of those libraries.

The author of the article is complaining when someone downstream overrides those decisions. If you're asking how many clusters one of the primary developers of Riak has run, you may not be reading closely enough.

Splitting a package into docs, libs and executables makes a lot of sense. Splitting those further, so you've got umpteen "independent" packages which 95% of users are just going to have to manually recombine to get the functionality the upstream package provides out of the box, can get pathological. Debian has historically been particularly bad at this, and Ubuntu inherited that tendency.
Please give an example of a pathological case in Debian.
Ruby in etch was pretty absurd, from memory.
In principle I agree, however in my experience the time from me submitting a patch to L that I need for my new feature in P to work until that patch makes it into a stable release of L that packagers actually ship can be month.

In that time frame I'm stuck between not shipping a new version of P (often unacceptable as I've got users to answer to) or shipping my own slightly modified version of L.