Hacker News new | ask | show | jobs
by PoignardAzur 404 days ago
> At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.

I'm not convinced that happens that often.

As someone working on a Rust library with a fairly heavy dependency tree (Xilem), I've tried a few times to see if we could trim it by tweaking feature flags, and most of the times it turned out that they were downstream of things we needed: Vulkan support, PNG decoding, unicode shaping, etc.

When I did manage to find a superfluous dependency, it was often something small and inconsequential like once_cell. The one exception was serde_json, which we could remove after a small refactor (though we expect most of our users to depend on serde anyway).

We're looking to remove or at least decouple larger dependencies like winit and wgpu, but that requires some major architectural changes, it's not just "remove this runtime option and win 500MB".

2 comments

I was very 'impressed' to see multiple SSL libraries pulled into rust software that never makes a network connection.
This is where a) a strong stdlib and b) community consensus on common packages tends to help at least mitigate the problem.

My feeling is that Python scores fairly well in this regard. At least it used to. I haven't been following closely in recent years.

A lot of people dunk on Java, but its standard library is rock solid. It even is backward compatible (mostly).
Did you dig any deeper over which paths that was pulled in?
Not in Rust, but I've seen it with Python in scientific computing. Someone needs to do some minor matrix math, so they install numpy. Numpy isn't so bad, but if installing it via conda it pulls in MKL, which sits at 171MB right now (although I have memories of it being bigger in the past). It also pulls in intel-openmp, which is 17MB.

Just so you can multiply matrices or something.

> Someone needs to do some minor matrix math, so they install numpy

I’m just not convinced that it’s worth the pain to avoid installing these packages.

You want speedy matrix math. Why would you install some second rate package just because it has a lighter footprint on disk? I want my dependencies rock solid so I don’t have to screw with debugging them. They’re not my core business - if (when) they don’t “just work” it’s a massive time sink.

NumPy isn’t “left pad” so this argument doesn’t seem strong to me.

Because rust is paying the price to compile everything fromch scratch on a release build, you can pay a little extra to turn on link time optimization and turn of parallelism on release builds and absolutely nothing gets compiled in that you don't use, and nothing gets repeated. Also enabling symbols to be stripped can take something with tokio, clap, serde, nalgebra (matrix stuff) and still be 2-5Mb binary. That is still huge to me because I'm old, but you can get it smaller if you want to recompile std along with your other dependencies.
MKL is usually what you want if you are doing matrix math on an Intel CPU.

A better design is to make it easy you to choose or hotswap your BLAS/LAPACK implementation. E.g. OpenBLAS for AMD.

Edit: To be clear, Netlib (the reference implementation) is almost always NOT what you want. It's designed to be readable, not optimized for modern CPUs.

I would argue that BLIS is what you want. It is proper open source and not tied to Intel platforms.