Hacker News new | ask | show | jobs
by kbenson 3793 days ago
I think that's a bit crazy as well. This is a problem if your build process happens often and requires pulling external data. Ideally, you want a way to cache that external data, and a way to force invalidation of that cache.

Building, at least after the first time, should not require external access. There are security reasons for this as well.

2 comments

So your proposed solution is one of the only two hard problems in computer science? That should be a solid clue that you're wrong.

"There are only two hard things in Computer Science: cache invalidation and naming things."

-- Phil Karlton

By "a way to force invalidation of that cache" I didn't mean automatic invalidation, I meant a way to flag that you want it to re-download dependencies and store them for later use. I'm not sure where you got the requirement that it needs to automatically determined by a computer from my comment. I was thinking the "cache" could be as simple as the person setting up the build environment downloading the dependencies and configuring the build to use them. That's a local cache, when discussing automatic downloading of dependencies during building.

Set up your build environment with whatever manual intervention is required so that it can run without downloading remote resources. Build as needed. There is no reason for, and many reasons against, downloading dependencies during the build process, but that doesn't necessitate duplicating those dependencies within your own source tree. As long as there are directions on how to download a specific, definitive version of the dependency, whether that is automated or not isn't really a big deal if it's done infrequently.

wow, i never realized cache invalidation was one of the ONLY two hard problems in CS
The quote is supposed to be, two hard problems: cache invalidation, naming conventions and off-by-one errors.
It's not, if you can not even run that first build then you actually have nothing to work on.

Also, not frozen dependencies means you are at the mercy on any dependencies changes breaking your build at any time.

With that, even if your first build run and go fetch those deps and can build at T1, it is not guaranteed at all that the build will work at T1+n.

There is a big difference between your team working from trunk and your team being dependent on other projects trunk.

Just because you're downloading your dependencies at runtime doesn't mean you have to have non-frozen dependencies or non-repeatable builds... that's one of the advantages of pulling dependencies out of a Git repository; specify a specific revision to build against and that code is guaranteed* to not change. Pulling dependencies from Git doesn't mean you're working against trunk.

Now, if you're doing this with mission-critical software, you should probably be maintaining mirrors of those dependencies locally on infrastructure you control, but, again, that's another of the things that Git makes easy.

You should never be dependent on a reference that can move, unless you're willing to accept the consequences (that includes branches in any version control system, tags if you don't have infrastructure to verify that they haven't changed, external non-version-controlled downloads, etc.).

Basically, what you should learn here is that you shouldn't build your business around a third-party service's continued availability. Especially if it's a third-party service where you're not paying for an SLA, like Github. Reproducibility of builds is a different issue, and including 100% of your dependencies in your own source repository is not the only solution to it.

* Barring a SHA-1 collision, which is highly unlikely with Git.

> It's not, if you can not even run that first build then you actually have nothing to work on.

Obviously you can run the first build. You wouldn't be using Github if you never got it working in the first place.

To clarify, setting up the build environment may require network access, but if the process of building requires it, there are many places where it can go wrong, both operationally and security wise.

> Also, not frozen dependencies means you are at the mercy on any dependencies changes breaking your build at any time. ...

I agree, but that's a separate discussion and doesn't really apply here. There's nothing preventing the pulling of a specifically tagged version for builds. If someone's build process that used Git for dependencies is not doing this, whether they are using Github or some internal server is irrelevant, the same problems apply.