Hacker News new | ask | show | jobs
by merpkz 46 days ago
How do you guys, who run Docker in production deal with managing nftables firewall on hosts running containers? By design docker daemon creates and manages a set of firewall rules to forward traffic between containers and ingress traffic into containers as well as masquarades the outgoing container traffic. That is all well until admin needs to alter hosts firewall to allow and deny other traffic unrelated to docker - and restarting nftables or even applying new nftables rules usually ( flush ruleset in /etc/nftables.conf ) purges all the docker created rules and effectively breaks everything until docker daemon is restarted and rules re-created. I have partially solved this by using nftables filter chains with different names - admin_input/admin_output and using input hook with negative priority - so that traffic I choose to block is evaluated before docker rules are applied - that feels a bit like hack, but so far is the only way I have found. It is good practice in this day and age to run local firewalls on all hosts with policy deny, so that only traffic explicitly allowed can pass, that can severely limit blast radius during compromise.
8 comments

My containers run in dedicated "docker host" VMs. And I never expose ports on 0.0.0.0, just the private internal IP. Most (all) of my docker hosts do not have a public IP anyway. I use wireguard to access them myself. If they need to be public I reverse proxy with caddy from my web server (or use Authentik's embedded proxy). These servers have access to the same private LAN which could be hardened without having the issues you brought up.

By the way most docker based implementations do not actually need the userland proxy docker runs automatically. Disable it in /etc/docker/daemon.js

{

    "userland-proxy": false

}
https://www.macchaffee.com/blog/2024/you-have-built-a-kubern...

Like, if that works for you, more power to you. But that is a lot of moving parts in exchange for using a tool whose value prop is that it doesn't have many.

That's neither kubernetes nor a lot of moving parts, just basic sysadmin setup for good hygiene and piece of mind.
I wish. There's nothing like Kubernetes here nor the features it gives you or any need for them. Just some basic sys admin stuff that works well for me.
This is the way, ended up using identical setup.
What would the config look like if I have my docker containers split up over multiple VMs?
I have all of mine on the same (or accessible) internal LAN so they can all talk to each other. You can get the connection going with Wireguard if they are in different places in terms of networking.
As in you have a VLAN just for the docker containers to talk to each other on?
Amounts to the same thing but no. Promox servers with two bridged interfaces. One interface has a public IP, the other a 10.0.10.0/24 etc. Multiple baremetal servers are connected by wireguard and have access to each other's private subnets. Like one other might be the 10.0.20.0/24. Setup the routes and good to go. Firewall to taste. My private LAN is all open.

This is not just for docker. There are other vms and lxc containers too.

Very interesting way to set things up. Thanks for the breakdown! It's given me some ideas for our non-prod Proxmox cluster.
Could you elaborate on your setup? Is the docker host also your web server on which you run caddy?
No it just needs to have route to the internal IP of the docker host. And you expose your ports on that IP. Let me know if you need more details. You could also put the reverse proxy (Caddy in my case) on the docker host.
I reverse proxy everything through a Caddy instance running on the same machine so I avoid the firewall dance entirely by just prefixing all my port assignments in the compose file with the loopback IP (eg. 127.0.0.1:3000:3000). Nftables denies all but 80 and 443 and I don't have to worry about restarts/flushes breaking things.
A really nifty thing is that you can also of course bind this to the device's tailscale ip!

Also you don't even need the loopback address if the traffic is between one container and another, just a bridge network is fine.

This is how I self host all my home services (Home Assistant, PFSense, Frigate etc), I do not for the life of me understand why so many folks doing self-hosted services for themselves put them on the public internet.

Caddy will even do fully automated valid TLS certificates for private IP ranges via DNS ACME challenge for free etc with renewals handled, so all my internal self-hosted sites have properly terminated TLS too, accessible by connected VPN clients.

It's funny that for many of us in our day job, we stand up private services behind a VPN all the time so only work clients can access it, but when self hosting don't bother with a simple wireguard/tailscale config etc.

A lot of people using docker or even k8s don‘t know that by default, a service is available to all other services via the service name defined in the compose file or your yaml specs. Docker compose builds an implicit bridge network. Most internet tutorials are wrong here and bing ports publicly to your ipv4 interface. So if you follow them you‘ll accidentally expose your database or similar to the public web
This is surely the easiest and I would guess the safest way, and has the added benefit that your proxy (nginx in my case) can handle SSL for you, making certificate deployment a breeze.
On my docker hosts there is no other traffic unrelated to docker. Everything goes in containers.
Well, as an example we usually set incoming rules to filter SSH only from administrator IP addresses, TCP 10050 only from zabbix monitoring server and leave few icmp types required and rest is dropped and logged.

For forward chain we set docker network ranges to route between themselves and only services actually used in containers. Allow container outgoing connections to our DNS servers, centralized HTTP proxy server and monitoring - nothing else containers are allowed to route to.

And for output is similar, only allow our DNS servers, NTP, HTTP proxy, centralized rsyslog where everything goes and zabbix monitoring server and a few icmp types - nothing else gets out and is logged.

With the advent of these supply chain attacks we read about often here it's just a matter of time some container is compromised and this seems like only viable way to at least somehow limit impact when such an event occurs.

To expand, you can use privileged containers, host network, capabilities, etc if the software really needs it. In that case, Docker basically becomes an init system/service manager but you get a singular daemon managing everything
I put a firewall ahead of the Docker host so that they aren't running on the same system. Docker can do what it wants to on the host without stepping on my firewall rules.
It makes sense but that's more overhead and the spirit of the post seems to be "can we just docker compose and be done with it?"
I use UFW, and this config: github.com/chaifeng/ufw-docker

The only modification is that I pin containers to an IPv4 address so I can limit the forward rule to that address.

Adding to other answers: many cloud providers, including more reasonably priced one like hetzner etc offer firewall as a service where you can configure the firewall there instead of on the OS itself.
I don't. I'd run other workloads on separate hosts
firewalld supports docker and handles all of its routing/changes. I've standardized on using it in my environment.