Hacker News new | ask | show | jobs
by dividedbyzero 1271 days ago
Could you elaborate on the Docker Hub situation?
1 comments

Very low rate limits. We rehost images we use often off Docker Hub.
AWS also mirrors docker official images: https://gallery.ecr.aws/docker

Announcement here (Nov 2021) https://www.docker.com/blog/news-from-aws-reinvent-docker-of...

That's really cool and I didn't realize that! For future readers, key points from the article:

"Note that while pulls from ECR Public do work from outside AWS, they are rate limited if not authenticated with an Amazon account, and you should generally use the Docker Hub addresses if you are pulling from outside AWS. Please see the ECR Public quotas documentation for more about how limits work with ECR Public.

If you are an AWS customer, pulling Docker Official Images from ECR Public offers several advantages. ECR Public is replicated across all AWS regions, so pulls are local to the region you pull from. This helps ensure lower latency for requests and ensures that all your resources are in the same failure zone, which is the recommended architectural pattern."

Working for a company that is both owned by Amazon and use docker hub for quite a while for a lot of Base images for build the number of times that we had build failures or minor outages due to docker hub being down or us being rate limited is well into four digits. Luckily these were generally low impact on a developer could emergency patch in some of these situations so it never really got us. But if someone who's been pushing for us to just use the AWS alternative since we're very heavily on AWS because we're owned by Amazon so it just makes sense, it's always been a little bit frustrating that people just pull directly from the internet as opposed to the AWS data center that they're literally running in. So I'm very happy about these base images on my computer platform at a very very low cost (network) with high availability for me.

As the AWS public docs say it's always better to pull from the data center that you're sitting in. Data center math is always more forgiving if you pull from the data center you're in as opposed to playing from another Data center because the chance that both data centers are having problems is higher than the chance that any one is having problems.

In case not everyone knows, it's super easy to host your own container registry, for public or private use. It's basically just this

    docker run -d -p 5000:5000 --name registry registry:2
with more options like auth and certs. Infra might include backing disk and LB; if you need to scale, run several and keep them in sync with one of many open tools eg regclient.

Also plenty of cloud services now have registries like GHCR, ECR, etc. which are basically pay per Gb.

https://docs.docker.com/registry

https://github.com/regclient/regclient

Yeah. All of our proprietary images are on GHCR, so we also rehost the public ones on it as well. We have GitHub Actions that repull and republish, whether on cron or on trigger.
How often do you synchronize them with the Docker Hub?
If it's an important image part of prod, they're pinned to a specific version anyway and get upgraded like software dependencies after testing about once a week. If it's a utility image (e.g. for the cloud dev environment), there's a periodic job that checks for updates every hour because no one would care enough to update manually.
You can configure Docker's reference registry software as a pull through cache https://docs.docker.com/registry/recipes/mirror/

I think other container registry's like Artifactory also support pull through