Hacker News new | ask | show | jobs
by mr-karan 1526 days ago
While this is remarkably a good hack and I did learn quite a bit after reading the post, I'm simply curious about the motivation behind it? A docker image even if it's a few MBs with Caddy/NGINX should ideally be just pulled once on the host and sit there cached. Assuming this is OP's personal server and there's not much churn, this image could be in the cache forever until the new tag is pushed/pulled. So, from a "hack" perspective, I totally get it, but from a bit more pragmatic POV, I'm not quite sure.
3 comments

It gets pulled once per host, but with autoscaling hosts come and go pretty frequently. It's a really nice property to be able to scale quickly with load, and small images tend to help with this in a variety of ways (pulling but also instantiating the container). Most sites won't need to scale like this; however, because one or two hosts is almost always sufficient for all traffic the site will ever receive.
I did mention that it's the OP's server which I presume isn't in an autoscale group.

Even then, saving a few MBs in image size is the devops parlance of early optimisation.

There's so much that happens in an Autoscale group before the instance is marked healthy to serve traffic, that an image pull of few MBs in the grand scheme of things is hardly ever any issue to focus on.

Yeah, like I said, I'm not defending this image in particular--most static sites aren't going to be very sensitive to autoscaling concerns. I was responding generally to your reasoning of "the host will just cache the image" which is often used to justify big images which in turn creates a lot of other (often pernicious) problems. To wit, with FaaS, autoscaling is highly optimized and tens of MBs can make a significant difference in latency.
Noted, that makes sense. Thanks!
could be very useful in serverless space as lambda do support container image now. the image will be pulled much more often.
The less resources you use from your system, the more things you can do with your system.
Only matters if you're actually using those extra cycles or not. The majority of web servers hover at <10% CPU just waiting for connections.
I don't know if that's really true - if you're renting the server from a cloud provider chances are you can bump down the instance size if you don't need the extra processing capacity... and if it's a server you manually maintain I think lighter usage generally decreases part attrition, though the other factors in that are quite complex.
I feel like there's a lot of low-hanging fruit on the table for containers, and it's weird we don't try to optimize loading. I could be wrong! This seems like a great sample use case- wanting a fast/low-impact simple webserver for any of a hundred odd purposes. Imo there's a lot of good strategies available for making starting significantly larger containers very fast!

We could be using container snapshots/checkpoints so we don't need to go through as much initialization code. This would imply though that we configure via the file-system or something we can attach late though. Instead of 12-factor configure via env vars, as is standard/accepted convention these days. Actually I suppose environment variables are writable, but the webserver would need to be able to re-read it's config, accept a SIGHUP or whatever.

We could try to pin some specific snapshots into memory. Hopefully Linux will keep any frequently booted-off snapshot cached, but we could try & go further & try to make sure hosts have the snapshot image in memory at all times.

I want to think that common overlay systems like overlayfs or btrfs or whatever will do a good job of making sure, if everyone is asking for the same container, they're sharing some caches effectively. Validating & making sure would be great to see. To be honest I'm actually worried the need-for-speed attempt to snapshot/checkpoint a container & re-launch it might conflict somewhat- rather than creating a container fs from existing pieces & launching a process, mapped to that fs, i'm afraid the process snapshot might reencode the binary? Maybe? We'd keep getting to read from the snapshot I guess, which is good, but there'd be some duplication of the executable code across the container image and then again in the snapshotted process image.