Hacker News new | ask | show | jobs
by drewg123 4049 days ago
Self contained packages are not a new idea. For example, PC-BSD has been doing this for years, via their PBI package format. See the description of PBI here: http://www.pcbsd.org/en/package-management

I think PBI does de-duplication at the package manager level by manipulating hard-links to common files, rather than installing multiple copies.

2 comments

> I think PBI does de-duplication at the package manager level by manipulating hard-links to common files, rather than installing multiple copies.

Which is, itself, a bad reinvention of Plan 9's Venti filesystem. Having one, or two, or a million files on disk containing the same data should take up as much space as having just one. "Hard links" are a policy-level way to express shared mutability; deduplication of backing storage, meanwhile, should be a mechanism-level implementation detail.

ZFS has support for block-level deduplication and it comes with heavy memory and performance requirements. File-level deduplication with hard links is lightweight and requires no special support (besides a filesystem which supports hard links, obviously).
Well, there is something to be said for the 'worse is better' approach.
Self contained packages are not the problem, individualized libraries shipped along with those packages are.

How does PBI handle minor library version differences then? If one package provides and uses mylib-1.3.1 and another provides and uses mylib-1.3.5, how is that distinguished as the core library level (the plain .so file)? My understanding of what Clear Linux is attempting allows this level of granularity to ensure a package (really an amalgamation of individual packages in the sense of most current unixy distros) is functional and updated as a whole.

I believe that PBI works the same way. It allows different packages to independently use multiple versions of the same lib simultaneously. There is deduplication via hardlinks ONLY when the checksums match.

That's what made it attractive to me: I've painted myself into a corner several times when trying to install Ubuntu PPAs that want conflicting versions of shared libs

I'm almost of the mindset that just having userspace apps as lxc containers with everything an app needs may be better for the most-used applications...

Given how relatively cheap even fairly big SSDs are, is it really worth the storage savings for your browser to share a couple .so files with your mp3 player?

I actually like how PC-BSD pbi packages work... given the number of times solutions have been made to work around the issue and reduce space... I'm not sure it's always worth it. At least not in the desktop space.

Imagine the problem of tracking down all the different versions of a library when an exploit is found. If you have 20, or even 50 different apps that bundled openssl, imagine the hassle of making sure each one was vetted and updated as needed, not to mention the delay in getting all the different packages rebuilt and pushed (which may be a small delay, or may not, depending on the vendor).
You regularly use 20 or 50 end-user applications that use openssl?

I'm not talking the low-level OS applications here... I'm talking end-user applications and major exposed services.

For that matter, each of those applications needs to be updated, vetted and packaged... it's a matter of the level and completeness of packages.

What's considered an end user application? Installed languages (Perl, Python, Ruby, etc)? Would you consider all of regular userspace one "app", or split it into multiple chunks (dev tools, web tools, etc)? Wget, curl and chrome all use openssl.

smtpd? httpd? sqld? sshd? ntpd?

This may be illuminating, it's the list of RPMs that have a requirement containing the string ssl: # for RPM in `rpm -aq --qf '%{NAME}\n'`; do rpm -qR $RPM | grep -iq ssl && echo -n "$RPM "; done python-ldap libssh2 mailx Percona-Server-server-55 abrt-addon-ccpp libfprint qpid-cpp-client-ssl perl-Crypt-SSLeay ntp httpd-tools openssh openssl-devel pam_mysql redhat-lsb-core Percona-Server-client-55 sssd-common perl-IO-Socket-SSL qt squid openssl098e elinks ipa-python python-nss compat-openldap Percona-Server-shared-55 systemtap-runtime perl-Net-SSLeay python-libs Percona-Server-shared-compat openssl systemtap-client qpid-cpp-server-ssl certmonger python-urllib3 openssh-server nss_compat_ossl openldap percona-toolkit pyOpenSSL git sssd-common-pac wget ntpdate openssh-clients openssl systemtap-devel postfix nss-tools perl-DBD-MySQL libcurl curl sssd-proxy libesmtp

That's just for SSL, which while it's used in many applications and services, generally is limited to items that communicate externally, for the most part. What about when it's a core library that everything uses? tzdata updates often... We want correct time representations, right? Gzip is used by a lot of applications. What about glibc?

It's not an easy problem, but that's why I'm interested in how it turns out.