Hacker News new | ask | show | jobs
by 5d8767c68926 1361 days ago
I have thought there needs to be much more trivial plug-and-play caching solution that works for the major services: npm, pypi, cargo, docker, etc. Right now, is is justifiably annoying enough that nobody worries about it until they are squandering terabytes of bandwidth or dealt with an external outage.
3 comments

It's called HTTP. The problem with all these services is they're using HTTPS for encryption and verification.

Which breaks all normal caching proxies that can be easily used.

The popular packaging formats are better at this: distribute keys over https, then just http to download and verify packages. Squid and co work just fine here.

No one will bother putting in the effort until usage quotas and costs get involved. While repos allow unlimited free pulls, it makes no sense for users to spend a minute caching anything.

Once quotas come in, you’ll find all the tooling and guides make it super simple.

> No one will bother putting in the effort until usage quotas and costs get involved. While repos allow unlimited free pulls, it makes no sense for users to spend a minute caching anything.

I don't know about that. One of the benefits of introducing something like Nexus can be a noticeable speedup for your own builds, once your proxy repositories have the versions of dependencies you need cached.

Of course, you could use some sort of a local build cache (e.g. m2 directory for Maven packages on the server) but when you're building your own containers it doesn't always turn out to be as viable, especially when you have many CI nodes, each of which would have a local build cache.

On an unrelated note, something like Nexus also gives you the ability to easily start deploying/using your own container images, libraries or even arbitrary files, all without having to store your data in the cloud, or figure out how many different solutions/accounts you might need (otherwise you'd need some packages on Docker Hub, some Node packages on npm, some Java packages in Maven repos etc.).

There are very unsexy ways to reduce pull behaviour. In a previous life I built a CI/CD pipeline for an AI company and some of the container images were huge (ie GBs) so I did a lot of work on image caching. Mainly breaking up large containers into layers, starting with large layers with longer shelf-life, and incrementally adding faster-changing or smaller layers on top. Any time a large layer changed, it would trigger a packer rebuild for the CI runner AMI. The AMI also contained a start-up service that discovered and refreshed base images at boot time so they would (hopefully) never delay a real CI job. The natural scale up/down and max lifetime of the runner pool guaranteed that we never had an old AMI in use. It's likely that most people wouldn't bother with that level of optimization (or even host their own runners) but we need it.
Artifactory and Nexus work great for that.