Hacker News new | ask | show | jobs
by 908B64B197 2053 days ago
Question: What keeps an organization from hosting it's own package mirror internally and only periodically fetch the diffs from the central registry?
5 comments

There are tools for doing this, but it's a matter of cost and complexity to deal with them.

Artifactory seems to have a pretty big chunk of this vertical. It supports a few different repository protocols, so it serves as a bit of a one-stop shop that survives technology changes.

If you are fetching multiple GB of images over the network it kinda make sense.
Way more than “kinda”. If you have a continuous integration pipeline that checks out projects from scratch (as it should), every build fetches all dependencies, transitively.

Even ignoring download costs, a local cache (one of the functions of an artifactory) helps speed up those downloads and,with it, your builds. It probably also helps against getting blacklisted by code repositories.

An artifactory also automatically backs up any libraries you use. That protects against them disappearing from the internet.

I think the first wave of artifactory customers was also populated by companies with limited network connectivity. It’s nuts to run a Rails or J2EE project if your company is using a pair of 1MB modems for all traffic, even if the dependencies are relatively small. Branch offices are similarly hamstrung. That was part of Perforce’s customer base as well, since they could run a local proxy for source code.

As you get into CI/CD you start to notice that your upstream repo is occasionally down, because it’s getting in the way of some deadline.

We use Nexus and cache all of our packages, but it is one more system to maintain and update. Sure Nexus is a great asset, almost never gave us trouble.
AWS has a managed service CodeArtifact that supports all the common code package repos and allows caching of upstream repos. Granted it doesn't work with Docker Images, but you asked about packages.
Someone would have to support it 24x7 and we could never get the uptime of DockerHub/ACS/ECS. Since a Production k8s deployment could spin up an instance at any time of day, some type of 5-9 or at least 4-9 uptime is pretty important.
I see.

I guess you could still fall back to the main package source if the local mirror is down.

As long as you're not doing push stuff:

1) set up a series of N docker registry mirrors in pull-through mode (https://docs.docker.com/registry/recipes/mirror/, it's as simple as "docker run --rm --name registry -d -p 5000:5000 -e REGISTRY_STORAGE_DELETE_ENABLED=true -e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io -v /mnt/persistentdata/registry:/var/lib/registry registry")

2) expose them on the same domain name (multiple A records, loadbalancers, whatever you want)

3) set them as mirrors in each machine's docker daemon

In case one of your mirrors go down, take them out of the DNS/LB rotation. That's it.

Why is no one talking about this solution?
Cause users still need to update where they are pulling from.

There'd need to be a way within docker to alias to the new URL so that what normally would go to docker hub ends up pulling from the mirror.

No they do not, that is the entire beauty of the pull-through mirror. For user code, as long as they keep referencing only to Dockerhub images, nothing needs to be changed (edit: except Gitlab CI configurations using docker:dind, which needs to be informed about the mirror).

The only downside is, as said, that it can't cache third party repos (quay.io comes to mind for people involved in k8s). For these, one has to mess with the resolv.conf and self-signed HTTPS certs for the Docker registry mirror.

Nothing. We did this when NPM was having issues and it worked very well for us, we also did this for some non-US team mates who had very poor NPM performance.

It runs well, is easy to keep up and working and generally was awesome.