Hacker News new | ask | show | jobs
by dhruvrrp 1539 days ago
Slight nitpick, but `apt-get update && apt-get install -y openssl dumb-init iproute2 ca-certificates` in the dockerfile is not the recommended approach.

That command itself means that a docker container is no longer reproducible. You cannot build it (with any code changes for your service) and guaranteed to be the same since that might be in production due to changes in the packages.

Always better to go with the base image, add your packages to the base and then use that new image as the base image for your application.

6 comments

> That command itself means that a docker container is no longer reproducible.

It's a tradeoff between making container images reproducible, and not shipping security vulnerabilities.

People tend to prefer the latter.

Furthermore, you can exec your way into a container and check exactly which package version you installed.

> It's a tradeoff between making container images reproducible, and not shipping security vulnerabilities.

You can regenerate your base images every day or more often and have consistent containers created from an image. Freshly generated image can be tested in a pipeline to avoid issues and you won't hit issues like inability to scale due to misbehaving new containers.

> You can regenerate your base images every day or more often and have consistent containers created from an image.

That solves nothing, as it just moves the unreproducibility to a base image at the cost of extra complexity. Arguably that can even make the problem worse as you just add a delta between updates where there is none if you just run apt get upgrade.

> Freshly generated image can be tested in a pipeline to avoid issues and you won't hit issues like inability to scale due to misbehaving new containers.

You already get that from container images you build after running apt get upgrade.

`apt` runs during the creation of 1-3 VM images per architecture and not during creation of dozens of container images based on each VM image.

When we have VM images upon which all our usual Docker images were successfully built, we trust it more than `FROM busybox/alpine/ubuntu` with following Docker builds. I've detailed the process in a neighboring comment[1] but you're right that it doesn't suit all workflows.

[1] https://news.ycombinator.com/item?id=30810251

For AMIs (and other VM images) it might make more sense. With containers? Not so much. And with a distributed socket image caching layer it makes even less sense.
We have a maximum image age of 60 days at work. You gotta rebase at a minimum of 60 days or when something blows up. Keeps everyone honest and honestly not that bad. New sprint new image then promotion. And with a container repository and it being internal does reproducibility really matter? Just pull an older version if push comes to shove.
I don't know (I know) why people aren't moving to platforms like lambda to avoid NIH-ing system security patching operations. We can still run mini monoliths without massive architectural change if we don't get too distracted by FaaS microservice hype
Why would someone pay per-request when you can have infinite always-warm requests for a flat-rate?
When your workloads are unpredictable and spike suddenly such that you can't scale quickly enough to avoid having a bunch of spare capacity waiting around and have HA requirements. In this scenario more is spent on avoiding variable spend to achieve a "flat" rate
> I don't know (I know) why people aren't moving to platforms like lambda to avoid NIH-ing system security patching operations.

Perhaps because people do their homework and just by reading the sales brochure they understand that lambdas are only cost-effective as handlers of low-frequency events, and they drag in extra costs by requiring support services to handle basic features like logging, tracing, and even handling basic http requests.

maybe for very predictable workloads. arrogant of you to say adopters haven’t done their homework
The image is already built, so it won't rerun those commands when scaling up new instances. Or am I misunderstanding your comment?
Basically you recreate your personal base image (with the apt-get commands) every X days, so you have the latest security patches. And then you use the latest of those base images for your application. That way you have a completely reproducible docker image (since you know which base image was used) without skipping on the security aspect.
> Basically you recreate your personal base image (with the apt-get commands) every X days, so you have the latest security patches.

How exactly does that a) assure reproducibility if you use a custom unreproducible base image, b) improve your security over daily builds with container images built by running apt get upgrade?

In the end that just needlessly adds complexity for the sake of it, to arrive at a system that's neither reproducible nor equally secure.

If I build an image using the Dockerfile in the blog post 10 days later, there is no guarantee that my application would work. The packages in Ubuntu's repositories might be updated to new versions that are buggy/no longer compatible with my application.

OP's suggestion is to build a separate image with required packages, tag it with something like "mybaseimage:25032022" and use it as my base image in the Dockerfile. This way, no matter when I rebuild the Dockerfile, my application will always work. You can rebuild the base image and application's image every X days to apply security patches and such. This also means I now have to maintain two images instead of one.

Another option is to use an image tag like "ubuntu:impish-20220316" (instead of "ubuntu:21.10") as base image and pin the versions of the packages you are installing via apt.

I personally don't do this since core packages in Ubuntu's repositories rarely introduce breaking changes in the same version. Of course, this depends on package maintainers, so YYMV.

Eh, that’s a heavy handed and not great way of ensuring reproducibility.

The smart way of doing it would be to:

1. Use the direct SHA reference to the upstream “Ubuntu” image you want.

2. Have a system (Dependabot, renovate) to update that periodically

3. When building, use “cache from” and “cache to” to push the image cache somewhere you can access

And… that’s it. You’ll be able to rebuild any image that is still cached in your cache registry. Just re-use a older upstream Ubuntu SHA reference and change some code, and the apt commands will be cached.

I'm applying security patches, necessary updates and similar during system image creation (VM image - for example AWS AMI - the one later referred in Dockerfile's FROM). Hashicorp's Packer[1] comes in handy. System images are built and later tested in an automated fashion with no human involvement.

Testing phase involves building Docker image from fresh system image, creating container(s) from new Docker image and testing resulting systems, applications and services. If everything goes well, the system image (not Docker image) replaces previously used system image (one without current security patches).

We have somewhat dynamic and frequent Docker images creation. Subsequent builds based on the same system image are consistent and don't cause problems like inability to scale. Docker does not mess with the system prepared by Packer - doesn't run apt, download from 3rd party remote hosts but only issues commands resulting in consistent results.

This way we no longer have issues like inability to scale using new Docker images and humans are rarely bothered outside testing phase issues. No problems with containers though, as no untested stuff is pushed to registries.

[1] https://www.packer.io/

Wow, I messed up VMs and Docker images a bit in above post. We're using Packer for both.
I mean, HN is the land of "offload this to a SaaS" and when we can actually offload something to a distro, like "guarantee that an upgrade in the same distro version is just security patches and won't break anything", it is recommended to avoid doing it?
Security assfarts will yell at you for either approach. It'll just be different breeds yelling at you depending which route you go, and which one most recently bit people on the ass.
That's a bold claim. Do you have any references to support it? The examples in Docker's documentation use apt-get directly and I don't see any recommendation to use a base image as you describe.[1][2]

With Debian, there are snapshot images[3] which seem like a better approach for making apt-get reproducible. You'd simply have to change the "FROM" line in the Dockerfile to something like "FROM debian/snapshot:stable-20220316" (where 20220316 is the date of the image you are trying to reproduce, helpfully given in /etc/apt/sources.list).

With the approach you describe, you would have to carefully manage the base images: tag them, record which one was used to create each application image, and keep them around in order to reproduce older application images.

I'm sure there are situations where the approach you describe is useful (e.g. with other package managers, especially ones that don't have a notion of lockfiles), but it adds complexity and I don't think it's necessarily justified in the case of apt-get (at least on Debian).

[1]: https://docs.docker.com/engine/reference/builder/#exec-form-...

[2]: https://docs.docker.com/develop/develop-images/dockerfile_be...

[3]: https://hub.docker.com/r/debian/snapshot

But the base images seem to not be stable themselves. The article's example of ubuntu:21.10 was released on Mar 18 2022 as of today (Mar 26) [0]. So if the base image is not fixed, the reproducibility is already gone.

https://hub.docker.com/_/ubuntu?tab=tags&page=1&name=21.10

> Always better to go with the base image, add your packages to the base and then use that new image as the base image for your application.

But that makes your base image non-reproducible. You're just shifting the issue elsewhere.

Yes, this is bad not only from the reproducibility perspective, but you now also have two layers for the stuff that got updated.

I mean the unupdated files in the base image, plus the copy-on-write changes in the subsequent layers.

At that point your base image is not reproducible, so your improvement is going to be very limited.