| Sigh. 1. I would hope the default seccomp policy blocks AF_ALG in these containers. I bet it doesn’t. Oh well. 2. The write-to-RO-page-cache primitive STILL WORKED! It’s just that the particular exploit used had no meaningful effect in the already-root-in-a-container context. If you think you are safe, you’re probably wrong. All you need to make a new exploit is an fd representing something that you aren’t supposed to be able to write. This likely includes CoW things where you are supposed to be able to write after CoW but you aren’t supposed to be able to write to the source. So: - Are you using these containers with a common image or even a common layer in an image to isolate dangerous workloads from each other. Oops, they can modify the image layers and corrupt each other. There goes any sort of cross-tenant isolation. - What if you get an fd backed by the zero page and write to it? This can’t result in anything that the administrator would approve of. - What if you ro-bind-mount something in? It’s not ro any more. |
I see a lot of projects blocking those sockets in containers as a response to this exploit, but it seems rather strange to me. We're disabling a cryptographic performance enhancement feature entirely because there was a security bug in them that one time? It's a rather weird default to use. It's not like we're mass-disabling kernel modules everywhere every time someone discovers an EoP bug, do we? Did we blacklist OpenSSL's binaries after Heartbleed?
I suppose it makes sense as a default on vulnerable kernels (though people running vulnerable kernels should put effort into patching rather than workarounds in my opinion), but these defaults are going to be around ten years from now when copy.fail is a distant memory.