Hacker News new | ask | show | jobs
by orf 2421 days ago
Any project with CI should recreate the environment a lot.

The lack of a Conda lockfile makes it impractical to use outside of toy research proof of concepts.

2 comments

> Any project with CI should recreate the environment a lot.

That's a very inefficient way to run your CI, with conda and pip alike.

Instead, you could build your environment once in a Docker image and use that as your build image.

Saves a lot of time on your builds, guarantees reproducibility, and will work even when package servers are unavailable.

We do, but docker caching is not terribly easy to implement in a project with lots of concurrent builds. At least not for me.

We build images based on their commit hash, caching off the last commit hash. That works but has issues with merges and the first commit to a branch.

We also do it based on the branch name, but Docker has issues around specifying multiple cache from arguments in the CLI. That causes unnecessary invalidations on branches.

That all leads to more rebuilds than I would like.

CI isn’t as sensitive to a few more seconds though, as it isn’t interactive.

And while lockfiles are convenient, they are far from making anything “impractical”. I take Conrad correctness over minor conveniences any day.

It adds many minutes on complex trees. That adds up fast if you’re doing just 60 builds a day.

Lockfiles are not just convenient, they are all but required for anything serious. You can’t have your dependencies break or change under your feet from one build to the next. Your builds need to be reproducible.

I’ll take all that over whatever conda has to offer any day.

> Lockfiles are not just convenient, they are all but required for anything serious. You can’t have your dependencies break or change under your feet from one build to the next. Your builds need to be reproducible.

I see. So because C, C++ and most other build environments don't have lockfiles, there is no way to do reproducible builds. I better tell the Debian people they've been wasting an awful lot of time on their reproducible build project. /s

Seriously, "lockfiles are required for anything serious"? That's ridiculous. But if you insist, a quick google shows e.g. [0] and [1] provide that.

[0] https://github.com/Nextdoor/conda_lockfile

[1] https://picky-conda.readthedocs.io/en/latest/index.html

I’ll leave it to someone else to explain to you how a .gitmodules file or a source tarball URL with a hash is a lockfile equivalent, or how “c, c++” isn’t a package manager, or how Debian do intact use several methods or specifying locked dependencies in a file format (a lockfile, if you will).

Most build environments do have lockfiles. And, just to clarify, that doesn’t have to be a specific dedicated file. It has to be something that can be versioned alongside the code, so each build gets the exact same set of dependencies and updates are explicit commits.

This basic principle is a requirement for anything serious (I.e something with customers). I’m sorry that this statement hit a nerve for you, but it’s true. In fact in some industries it’s a legal requirement.

And no, a dodgy third party plugin that hasn’t been updated in a year isn’t a good solution.

You missed the /s sarcasm tag in my post. You can, in fact specify specific versions for conda without any external plugins and people who are serious about their build being reproducible do.

In fact, they also specify a specific gcc for extensions that need it, because relying on the system gcc is not reproducible. How do you do it in pip/vent/pipenv/poetry?

Specifying first-level versions is not the same as a lockfile and does not ensure you have the same tree of dependencies each time, and is no way to ensure that your builds are reproducible. I'm not sure you're clear on exactly what a lockfile is, but the problem it solves is this:

You depend on `cool-package==1.0`. That depends on `another-package` with a loose specifier, i.e `another-package=LATEST`.

Now when `another-package` is updated you suddenly have a different tree of dependencies, because the package resolution is run again and `another-package=LATEST` is installed. Imagine you're in the middle of rolling back (or rolling out) a deploy to your product. Suddently what you've been testing and working with has changed, and `another-package=LATEST` breaks the deploy due to some changes or bugs.

What's worse is that it's now harder to roll back, as a re-build and a re-deploy will still bring in the broken `another-package=LATEST`!

The solution is to lock the tree of dependencies, including all sub-dependencies. This has the advantage of speeding up installation as resolution doesn't need to happen. So, your package tree is locked to:

    cool-package==1.0
    another-package==0.1 # Non-broken!
That makes any updates to your packages safe, versioned and able to be reverted.

The lack of this feature makes Conda a no-go for anything serious.

> specific gcc for extensions that need it, because relying on the system gcc is not reproducible

apt install gcc:4:4.9.2-2