Hacker News new | ask | show | jobs
by whazor 211 days ago
When I worked on an enterprise data analytics platform, a big problem was docker image growth. People were using different python versions, different cuda versions, all kinds of libraries. With Cuda being over a gigabyte, this all explodes.

The solution is to decompose the docker images and make sure that every layer is hash equivalent. So if people update their Cuda version, it result in a change within the Python layers.

But it looks like Flox now simplifies this via Nix. Every Nix package already has a hash and you can combine packages however you would like.

3 comments

I was an early and enthusiastic adopter of docker. I really liked how it would let me use layers to keep track of dependency between files.

After spending a few years using nix, the docker image situation looks pretty bonkers. If two files end up in separate layers, the system assumes dependency so if the lower file changes you need to build a separate copy of the higher one just in case there's actual dependency there.

Within nix you can be more precise about what depends on what, which is nice, but you do have to be thoughtful about it or you can summon the same footgun that got you with docker, just in smaller form. Because a nix derivation, while a box with nicely labeled inputs and output, is still a black box. If you insert a readme as an input to a derivation that does a build, nix will assume that the compiled binary depends on it and when you fix a typo in the readme and rebuild you'll end up with a duplicate binary build in the nix store despite the contents of the binary not actually depending on the text of the readme.

> you can combine packages however you would like

So this is true, more or less, but be aware that while nix lets you do this in ways that don't force needless duplication, it doesn't force you to avoid that duplication. Things carelessly packaged with nix can easily recreate the problem you mentioned with docker.

> If you insert a readme as an input to a derivation that does a build, nix will assume that the compiled binary depends on it and when you fix a typo in the readme and rebuild you'll end up with a duplicate binary build in the nix store despite the contents of the binary not actually depending on the text of the readme.

One my issues with Nix is the black box that is the store, and maybe it's just my system, but over time I find it full of redundant files / orphans and no obvious way to flatten it or clean it safely without breaking something.

I wonder how flox solves this.

Both fair points. The README rebuild issue is a Nix hiccup we don't solve; our quantized catalog reduces cascading rebuilds from upstream churn, but input over-specification is still there.

On store bloat: Flox makes it clearer what's in use (explicit environments vs. implicit dependencies), but you still need nix-collect-garbage.

The store accumulates cruft, that's Nix reality, we haven't changed it.

Just to follow up on this, Flox puts packages in one group by default so they share dependencies, plus our quantized catalog means way less version spread than raw Nix. So I do think we still improve on the Nix story, here.

We're also adding "stabilities" (downsample from daily to weekly/monthly snapshots) to reduce churn even more. Still need GC, but a lot fewer bags on trash day. .

Have you found that nix-store --gc breaks things, or are you concerned that it's not an aggressive enough garbage collection?

When I have space problems, I run that and they're gone, then later I do it again. It could be that I'm just avoiding functionality that it breaks though.

There's still leftovers. For me, it's not about space problem, but more of wanting to run a tight ship. Most recently, I tried running all the stuff on the wiki[1] but I still had leftovers that wouldn't go away no matter what (these were some CUDA dependencies, if it makes a difference). In the end I ended up blowing away my entire Nix install, manually deleting the store and reinstalling Nix - which isn't exactly ideal - considering that the whole point of Nix is to be deterministic and reproducible and all that jazz... so to me it doesn't make sense that it dirties the host and doesn't clean up after itself. Since then I've gone back to using containers as at least they don't pollute the host, and I feel like I'm in greater control over the entire environment.

[1] https://nixos.wiki/wiki/Cleaning_the_nix_store

I salute your desire to run a tight ship. If there were a leak in the nix store, I'd never know until I my garbage collection stopped solving the problem and I'm sure it would be at the worst possible time. If such a leak exists and is found, it'll be by someone running a tight ship.

Instead of running a tight ship I spend a lot of time dreaming about alternate computational universes, and one I particularly like is where new hard disks come pre-loaded with a fragment of bits deemed culturally valuable. Wikipedia and a well curated subset of nixpkgs would be a fine start to such an archive. In this world your files don't grow/shrink to consume/yield empty space, but rather the boundary between your data and that drive's shard of the public archive shifts in one direction or another. This way you or somebody in your neighborhood is likely to already have the file you need, so you can get it from them instead of the internet. Better for being able to roll with the punches if the internet is partitioned.

I don't worry about the size of the nix store because according to this weird fantasy of mine it's on the side of my disk that shrinks when I add files to it; it's not the contained object, but the gas that expands to fill the rest of the container. Not by accident, but as part of a redundant worldwide distributed cache that we put together after deciding that servers controlled by people we don't know are not to be relied on.

I feel your pain on this one despite being fairly comfortable with nix by now. This is 100% an issue either the documentation or the nix CLI should do a better job at.

The wiki rightfully points towards "roots", i.e. references produced by nix-build or similar. Additionally, there are other places that will keep references and hence block garbage collection though:

1. Your nix profile (`nix profile list` / `nix profile remove`) and its old generations (`nix profile (wipe-)history`)

2. Your NixOS configuration (configuration.nix) and its old generations (`nixos-rebuild list-generations`)

It doesn't help that there's no discoverable way to tell why a particular nix store path is not being garbage collected either.

The problem is that whiteouts are not commutative. If the layers you build turn out to be bit for bit identical the layers will be shared anyway, but its much mroe complex than Nix where the composition operation is commutative.
Yes, there were various attempts to do this in the container ecosystem, but there is a hard limit on layers on Docker images (because there are hard limits on overlay mounts; you don't really need to overlay all the Nix store mounts of course as they have different paths but the code is for teh geenral case). So then there were various ways of bundling sets of packages into layers, but just managing it directly through Nix store is much simpler.
And I'm back in the land of the living. Can't really beat a response from Justin Cormack!
Yes, this hits the nail on the head. We’ve seen the same explosion in image size and rebuild complexity, especially with AI/ML workloads where Python + CUDA + random pip wheels + system libs = image bloat and massive rebuilds.

With the Kubernetes shim, you can run the hash-pinned environments without building or pulling an image at all. It starts the pod with a stub, then activates the exact runtime from a node-local store.