Hacker News new | ask | show | jobs
by nchuhoai 3395 days ago
I'm so surprised that both Docker's as well as Google's solution to this problem involves 0 caching.

Much of our container building includes fetching dependencies from various package managers which rarely changes across builds, yet takes up the majority of the build time. Most of the time thats good enough, but for cases where we want to immediately deploy a fix, it can be quite frustrating to having to spend all this time seemingly unncessarily.

Edit: Docker 1.13 has the new --cache-from option so it seems relatively straight-forward to do I guess?

4 comments

Full Disclosure: I work on the Quay team at CoreOS

The Quay build system has implemented caching. The builder preemptively calculates the hashes of Dockerfile commands and then sends them to an API endpoint that finds the tag for the most similar tree of command hashes in a given repository. This tag is pulled before ever attempting to build the Dockerfile.

Google's solution seems far more general than just building Dockerfiles. I'm glad they're pushing for more innovation in this space.

I'd give this a look: https://github.com/GoogleCloudPlatform/cloud-builders/tree/m... (unfortunately key links to Bazel docs are 404'ing.)

Bazel is a general purpose build system that focuses on reproducibility (same build inputs -> same build outputs) and strict dependency specification (at various levels: file, subsystem, external package fetching etc.) enforced by sandboxing which allows for fast incremental builds. Bazel is the open-source version of Google's internal build-system.

More info here: https://bazel.build/

How external resources are handled: https://bazel.build/versions/master/docs/external.html

Obviously Bazel requires some buy-in which may be unappealing (at least for existing projects: migration/supporting multiple build systems.) There are, however, serious benefits (and a larger, but nascent, ecosystem of related open-source tooling.) The biggest gap at the moment is supported languages/etc.

(I don't work for Google, I just like Bazel.)

Thanks for pointing out the broken links, I'll fix those shortly. Until then, https://bazel.build/versions/master/docs/be/docker.html has the relevant information about Docker in Bazel.
We just open sourced our monorepo which uses Bazel - https://GitHub.com/staffjoy/v2

It has some major speed benefits. But it's not turnkey. Setting it up is hard. Its dep management in Go isn't compatible with lint, vet, etc. We never got node modules installing with Bazel.

If you send me an email (skelterjohn at google dot com) I'd love to help you get things working with that repo and GCCB.
Thanks for pointing out the broken links, they are now fixed. (I'm on the Google team behind this project.)
I tried to compile tensorflow recently, it uses Bazel as the build system. When compiling it downloads external dependencies to be used during the build. It failed to retrieve some dependencies and I had to restart the proceeds only to fail on other dependencies and it seem to download all of them again (even though it was successful in downloading the other dependencies).

Googling for this issue I found out that if you fail and try you might even be blocked (from the servers) for trying to fetch too many dependencies and the solution was to "try later"...

One approach to this problem that I like is storing these dependencies in an intermediate image. Then, you can do an incremental build by referencing (rather than rebuilding) this intermediate image, and pay only the cost of downloading the raw data which should be cheap compared to the normal install process.
Why not leverage Docker's built-in image layer caching?
I think the idea here is that the docker level caching isn't well suited for this. Eg for us we install all relevant gems with `bundle install`. If you update one gem, that the whole layer is rebuilt. Since the whole `bundle install` layer is rebuilt, every gem is fetched again from rubygems.

Yarn works the same.

Having a separate FROM image doesn't help that situation either, since you have to rebuild it.
Couldn't you FROM and then update? I'm not a ruby expert but I'd hope that it could be incremental.
That might be less effective if you can't guarantee that all the builds end up always on the same node.
Some century the industry will discover the NixOS model...
I like what I understand to be the core ideas of NixOS to be, but i have to say: There might be a chance that some century, people both in the industry and outside understand that new syntax, or languages, are rarely necessary, or even beneficial.

Use different semantics if you must, but please use some existing syntax for your DSL.

I'm so tired of learning a pointless new variety of almost the same thing, only with slightly different syntax, maybe some awkward string quoting, and some rather pointless syntactic sugar.

I'm not saying NixOS is the worst offender, but innovating in (mostly) superficial syntax, or using obscure syntax is almost certainly not well spent time except for a few very narrow niches/contexts. Especially not for prospective users.

I get it, it's fun inventing both languages and tools, I love doing it myself!

However, and in general: Don't invent a language when you need a tool, and vice versa.

The alternative is co-opting an existing language, which is what Ansible and Saltstack do with YAML. I find that to be much worse, since a lot more is valid as just YAML than ansible/salt files.
Agreed, Nix the language is probably unnecessary. (The main benefit I can think of is avoiding depending on the moving target that is basically any other scripting language they could have tried to co-opt.)
To be fair, Nix language is completely different from other languages out there -- it employs only immutable data structures -- so using another language would not be possible.

It does not excuse Nix to have such a braindamaged^W esoteric syntax.

> To be fair, Nix language is completely different from other languages out there -- it employs only immutable data structures -- so using another language would not be possible.

I don't think the immutability is actually important. Only derivations (which are basically language agnostic) need to be immutable/hashable.

> It does not excuse Nix to have such a braindamaged^W esoteric syntax.

What's so bad about it? The main pain point I've found with it is the lack of documentation for builtin functions. The syntax seems fine to me.

A programming language that "employs only immutable data structures" is effectively a macro processor language, no? Like https://en.wikipedia.org/wiki/M4_(computer_language) .
Jsonnet is the closest I could find.
I feel like I should plug Guix at this point. Pretty much similar to Nix, but the config language is Guile, which is Scheme.