Hacker News new | ask | show | jobs
by oblio 3077 days ago
Either the Curl developers are at fault somewhere, which I somehow doubt, or distributions are really special snowflakes, which I also doubt, or software distribution in the Open Source world is, in my opinion, flawed:

> Finally, I’d like to add that like all operating system distributions that ship curl (macOS, Linux distros, the BSDs, AIX, etc) Microsoft builds, packages and ships the curl binary completely independently from the actual curl project.

Why would everyone rebuild it? There are some security considerations (matching source and binary; disabling "dangerous" stuff) and some feature considerations (disable stuff you don't need to reduce resource usage - maybe), but conceptually this seems so wrong to me.

Conceptually I'd want downstream packagers to talk to upstream developers so that upstream has reasonable defaults and settings and I'd want packagers to just package and make the package follow distribution conventions. But rebuild seems overkill.

Maybe I'm missing something obvious?

10 comments

Almost all Linux distributions rebuild upstream software from source -- this ensures everything is built from the same toolchain (gcc/libc etc), and that the binaries distributed match the source.

It also allows for ease of patching in a stable release -- generally it's preferred to just fix specific high-impact bugs rather than moving to a new upstream version, which might introduce regressions.

(Context: I'm a Debian developer and on the Ubuntu MOTU team)

And then there is the whole dependencies shitstorm, were far too many upstreams have a bad habit of breaking APIs etc as they see fit.

There are ways to work around it, but it gets messy quickly. And rather than clean up their act they start championing things like Flatpak, that is basically a throwback to the DOS days of everything living in their own folder tree with a bit of souped up chroot thrown on top.

I really expect that if the likes of flatpak becomes mainstream in the Linux world having some flaw being found in a lib somewhere will produce a stampede of updates because every damn project crammed in a copy to make sure it was present.

"Almost all Linux distributions rebuild upstream software from source"

On Gentoo et al, the end user does the building. OK, yes the ebuilds are recipes but I've lost count of the times I've used epatch_user (https://wiki.gentoo.org/wiki//etc/portage/patches). You have a near infinite choice of ways to destroy your system, what with USE flags, mixed stable/unstable and all the other crazy stuff. Despite that, my Gentoo systems have been surprisingly stable.

In winter an update world session on a laptop keeps you (very) warm 8)

(wrt "Context": Ta for your work)

In order to make sure you actually have the freedom to modify the software, you want packagers to have the ability to rebuild the package from source - and preferably using just their OS as a build environment, not a black-box build environment from upstream like a Docker configured just so and make sure they have an equivalent binary. Even if there's nothing to patch today, there might be something to patch tomorrow.

I have seen many times Debian packagers try to build something from upstream and find that it just does not build anywhere other than the maintainer's computer. The fact that Debian requires that every package it ships can be built buy anyone in a generic environment is immensely valuable to free software, even if nobody used the binaries that Debian built. (And to be clear, other distros do the same, I'm just most familiar with Debian.)

I'd agree that in an ideal world, all the patches would be upstream and the binary would not just be equivalent but bit-for-bit reproducible. Some practical reasons why it wouldn't be are that various dependencies are of slightly different versions (e.g., one distro manages to get a new libc uploaded a little bit before another), that downstream conventions are different in different distros (e.g., Red Hat-style distros use /lib64 and Debian-style distros use /lib/x86_64-linux-gnu), or that a dependency has some shortcoming which many but not all distros patch in the same way, and the patch impacts things that use the dependency (e.g., upstream OpenSSL <1.1 does not have symbol versions, but most Linux distros patch them in). Yes, in an ideal world, all these things would be resolved, but there are going to be so many tiny things like this that come up that having infrastructure to accommodate them is the right plan.

Especially in the Linux world, you can't expect upstream to supply binaries for all possible architectures and configuration options. For example, you might be running on armv5 with libressl. Or you may be running on sparc64 with openssl. Or you may be running on 32bit windows with WinSSL. An upstream is unlikely to have access to build all possible configurations and provide binaries every time a security patch is announced.

Also, as a distro provider, you will want to be sure you can build the application yourself, because you might want to ship an updated library dependency that is ABI-incompatible and so you must be able to rebuild the consumers of these libraries. For example curl, in the case of openssl.

Have you ever been an upstream? What you're suggesting is awfully wasteful. You'd get endless requests for "please add build for XXX" if you ever distribute any binary. Where XXX is anything from arbitrary versions of arbitrary distribution, or any kind of non-linux OSes. And all that for i686, x86_64, various ARMs,... Who has time for that? If you state you only distribute source code and perhaps binary for Windows, you will not get bothered ever again.

Better let the building be done by people who are doing building all day en masse for a single arch, or have infrastructures set up to do builds for multiple archs easilly, than to expect every upstream to have this setup.

I can think of a few reasons

(1) Use of a different libc (alpine linux with musl)

(2) most builds are not reproducible, rebuilds are needed for security reasons

(3) non-rolling release distros (-> most distros) fork the upstream projects to backport fixes for their older releases

(4) Different filesystem structure (Gobo Linux)

(6) Most distros want to use the build system associated with their own package manager

There are probably many more than that. Most distros don't even try to stay close the upstream repo and instead maintain a lot of patches.

In the future we will most likely have an distro-specific basic system build and container apps (snap, appimage, flatpak) build directly by upstream on non-server systems.

> In the future we will most likely have an distro-specific basic system build and container apps (snap, appimage, flatpak) build directly by upstream on non-server systems.

I sure hope not. A better system would be something like NIX/GUIX or even Gobolinux, that give us a single package as today, but with the option of installing multiple versions in parallel if upstream has screwed up the API (again).

Flatpak and like will just be and excuse for upsteam to bundle everything and the kitchen sink, resulting in bloat and having to update a mass of paks rather than individual libs in case a flaw is found.

You've got good points and I admit I'm split on this issue.

In theory i would rather have everything handled by the package manager but container apps provide a few useful advantages:

  - sandboxing/isolation out of the box
  - being able to report bugs directly to upstream
  - It's easier to distribute a small app to all distributions (for some value of all)*
  - much easier to distribute proprietary applications (subjective advantage)
  - much easier to install old versions of an application (sometimes needed)
  - it's possible to record the bandwidth/ram/cpu usage; with a standard package that's quiet difficult
7) CPU architecture,

8) output binary format (though these days people generally only use ELF or PE),

9) compile time options (eg some packages will allow you to choose which TLS library to use at compile time)

10) hardware specific optimisations

Basically just a plethora of reasons

> Maybe I'm missing something obvious?

There are many different package managers depending on which Linux you're running.

Debian and derivatives use apt, Red Hat uses yum or dnf, SuSE uses yast, Gentoo has emerge, Arch has pacman... So each package manager needs to build their own package and it's easier to recompile from source than slice and dice a binary.

Also distributions will install the binaries, shared libraries, man pages, etc to different locations (some to /usr, some to /use/local, etc) which is also easier to define at configuration since most autoconf/make files support this already.

Finally they might want to add patches for distribution specific features or quirks. Or maybe they compile with ulibc instead of glibc.

There are many valid reasons why distributions would and should take the upstream source and build/package it themselves.

Really, why would everyone not rebuild it?

People have pointed out tons of reasons why distros do their own builds, but really I don't understand what would possibly be a reason not to!?

The build system itself also is just a piece of software that gets distributed along with the source of the software to be built, and just as you can download and run curl and get a predictable result (the download of some URL), you can download and run the curl build system and get a predictable result (a cURL binary).

So, in that regard, what does it matter what execution of the cURL build system your cURL binary came from?

In particular with the trend towards reproducible builds, where the build result will be bit-identical between different runs of the build system if it's using the same compiler (version), I just don't see why you care!?

Yeah, distros shouldn't just modify the software they package willy-nilly, but that's completely orthogonal to whether they should do their own builds. There are many reasons for applying small changes to enable integration with the distro, and in particular it's just unreasonable to expect that everyone of the thousands of upstream authors or Debian packages, say, operate machines of all ten CPU architectures that are currently supported by Debian so they can provide binaries for all architectures. So, if you want to be able to distribute software written by people who don't happen to have a System z or a MIPS machine in Debian, maintainers have to prepare packages in such a way that binaries for those architectures can be built--and if they do that, it's trivial to also build all the other architectures from the same source. Adding special cases for when a binary is already available for some architecture from upstream would be just completely pointless complexity.

Apart from various technical reasons the workflow of essentially all linux package managers is built on the notion that the binary package is built by some well defined (and repeatable, at least in the "does same thing" sense) process from some kind of source package. Also the souce package format usually contains mechanism that allows distributions to comply with copy-left licenses while also cleanly documenting which modifications were made to the package from upstream version (it is sort of special purpose versioning system).
Rebuild is necessary for things as trivial as changing the default install path. It's absolutley standard; a Linux distro that doesn't rebuild packages somewhere would almost not count as a distro.
Hmm. All the various Ubuntu derivatives (eg, Hanna Montana Edition or Christian Edition, or even Kubuntu and MATE Editions) that change some defaults without rebuilding packages... don't count as distributions?

This is somewhat tongue-in-cheek, but it's actually a question I don't have a solid yes or no answer to. I can see it both ways.

If your users are pulling software direct from the upstream distribution then you're not a separate distribution of Linux, you're a 'spin', 'edition', or an installer.

This does get a little murky when some of the packages are distributed directly but others are pulled from upstream like Antergos but the changes are so minor that I would still consider it an Arch spin.

For Windows, at least, the big reason to build the binaries would be to sign them with the Microsoft certificate, so that you know that the binary is authentic.