Hacker News new | ask | show | jobs
by fullwedgewhale 4000 days ago
One thing that strikes me is the explosion in dependencies in most software. I'm guilty of this, too. I've seen plenty of examples where an entire library or framework is added to a project just for a couple of features. Add a few libraries like that and suddenly you have a few megabytes of additional libraries, where maybe 90% or 95% of the features will never be used. A good article a while back looked at common unix utilities, comparing the size of commands like cp from the 1980's to the present. Most of the bloat had to do with features that almost nobody ever uses. It wouldn't be so bad if everyone used the same set of libraries. For example, almost all applications have a dependency on certain core libraries like libc.

But we often use different libraries that do essentially the same thing or different versions of the same library, so instead of 1 copy of libfoo.jar, I have 2 copies of libbar.jar and 4 copies of libfoo.jar that may all do essentially the same thing. Then I have essentially the same functionality in C++ (some libraries that wrap collections), Python (where maybe one of they python versions wraps one of the C++ libraries, but a different version). And of course I have a version installed in each ruby environment. Add to that their dependencies, and the dependency's dependencies, and you have a perfect storm of craptastic. So libfoo.jar version 1.2.3 depends on libbaz.jar 2.3.4 which depends on libqux 1.5.7. Let's say each one is 250k, and all I ever used was some list sorting utility in libfoo.

But I don't know what we could really do about it. You can't force everyone to program in C++ or limit them to a set of blessed libraries. I think maybe developers could be more judicious about when they could add a few lines of code and when they actually need to bring in a hard dependency an an external library. And it happens with commercial software as well. Maybe this is just the way the world will be.

4 comments

This is one of the biggest sources of bloat. In the node and ruby ecosystems especially, dependencies proliferate exponentially, where an application pulls in 12 libraries, each of which pull in 12 of their own, which each pull...

Downloading the depedencies for Ghost, the node blogging platform with the explicit goal of simplicity and minimalism, takes me minutes.

Compare this with the status quo when writing programs in C, where you might link to 4 libraries total, one of which pulls in 2 others as dependencies.

I've come to suspect that the super convenient package managers that all the "modern" languages have are at fault for this.

And you're giving what would be described as the exact opposite example to the parent post: why do node projects have hundreds of dependencies? Because those dependencies do exactly 1 thing most of the time (and usually pull in some other exactly 1 thing dependencies to do it).
That may be true in Node, but as a counter example, in Ruby, I saw dependencies creep into projects where there would be some minor point like "I need to do X" and that's done by library Y. In addition, Gem Y does A, B, C and D. In order to do all that it drags in several dependencies which are not directly needed. When you actually look at X, you realize it's not that difficult to do yourself. So at what point should you just write the functionality yourself and at what point do you rely on external libraries (and any baggage they can bring with them.) You have to maintain that code (even if it is fairly trivial), but you then have to maintain your dependency (keeping the gem up to date, maybe making small code changes to accommodate breaking changes in the gem). It can be a real mess.
Dart makes a decent argument for a smarter compiler. They've implemented "Tree Shaking" in their compiler: essentially cross-library dead code elimination.

This would probably be quite tricky in Java land where reflection does add new entry points, but it could be used to solve the problem of "I only need this one function from this library, don't compile in anything else".

I was personally quite surprised when I ported from code from Node to Java/Groovy and the resulting shaded JAR was > 70MB, I think at some point it peaked above 110MB. I don't know what I changed, but it's down to 35MB now. But the code that we've written in house on that codebase boils down to 1MB. But besides figuring out that I don't want to make local builds I scp to staging (because scp is terrible), these numbers are all completely equivalent for writing server-side software that runs on dedicated machines.

We could certainly make it more efficient, but there's exactly zero business case for it.

You can only tree-shake a whole program compilation, but then you cannot use compilation units, modules and modularity efficiently. You have to choose one or the other.

Every normal compiler implements simple (i.e. module level) dead-code elimination already.

EDIT: Of course you could use static libs, which does pull in only used symbols, but then you cannot share them across apps and update independently.

I implemented a tree shaker for my lisp and was very happy with it, esp. for delivery. Like Go does it nowadays.

Right, I guess I wasn't clear, this was a shaded/fat jar, so it had all its dependencies included statically.

I feel like our computing infrastructure has gotten to the point that dynamically linked libraries are no longer a good choice. I think dynamic linking has only caused us problems at work (devs install Node deps on the staging server, forget to tell ops, service crashes when deployed in prod), and the memory/disk/transfer overhead are practically irrelevant at this point. The only remaining reason to have dynamic libs is the idea that they can be updated without help from upstream, but that really only works if the software is compatible with the latest libraries, which isn't always true.

Supposedly ProGuard has some cross-module dead code elimination for JARs, but I haven't tried it: http://proguard.sourceforge.net/

> I ported from code from Node to Java/Groovy

How much of that was Java's fault and how much Groovy's, I wonder?

Given my own code was only 1MB total, neither. 97% of the size was from external dependencies. FWIW, it was a "shaded" jar that had all its dependencies linked in statically.
A good article a while back looked at common unix utilities, comparing the size of commands like cp from the 1980's to the present. Most of the bloat had to do with features that almost nobody ever uses.

That was actually a talk called "Bloat: How and Why UNIX Grew Up (and Out)": https://www.youtube.com/watch?v=Nbv9L-WIu0s

Well worth a viewing.

We can write tools that stop using idiotic ideas like dynamic libraries and only link in symbols that apps need.

If you only use one or two features out of a lib, why are you dynamically linking them in? If you do a static link, the linker can at least remove most of the bloat you don't use.