"two approaches but run data pipelines in a fully isolated environment - containers - which prevents user code from breaking free."
While (user) namespaces, root-less and cgroups do slightly reduce the attack surface, they are still running on a shared kernel instance.
In Dockers default configuration, using host namespaces and allowing --privlaged, anyone who can launch a container has full root level access to read disks of even the host machine via mknod or even update firmware. Lets hope that they are not using Linux bridges for containers too.
The belief that containers are somehow ultra secure will result in many breaches in the future. In theory if you have control of the code SELinux or Apparmor could help but most people don't use them and a major cloud providers solution doesn't even support them.
It is scary how many install bases even add capabilities to the container daemons so that they can run some form of storage persistence etc...
The risk of containers can be mitigated to an acceptable layer. But when ever I hear a company claiming that they are using containers because they are "secure" it is a huge red flag.
If you are a company making claims like the above, you are proclaiming that you either think that security through obscurity is a primary cybersecurity practice or you really don't understand how containers and namespaces work.
Thank you! :) I updated the article with a note on privileged containers.
Evaluating user code inside privileged containers is indeed a security nightmare. Fortunately, --privileged is not enabled by default, which is why I think that containers are quite secure by default.
Make sure you are at least using user namespaces which drop the mknod cap by their nature, or better yet in rootless mode.
I have filed several bugs that I know can result in breakout but as I can't make myself disclose vulnerabilities I have no stick to get them to change their 'wont fix' decisions.
k8s doesn't support user namespaces let alone user mount namespaces.
The point being that for k8s and docker, any role that allows you to create pods/containers or to compromise such a role with any hop number should be considered as having root permissions.
While I won't share any non-privileged breakouts, here is an example of how easy it is with the --privileged flag.
While I am not recommending it in general, AppArmor is fairly easy to develop CI friendly restrictions with and I would strongly suggest you protect the directory space and devices that you don't use with it.
Not perfect but it typically can help prevent leaks caused by adding features or configuration errors.
Runc using seccomp to make a container process make the one-way transition into a "secure" state and through dropped capabilities is what provides additional security.
Hiding pids doesn't matter when any container can list /dev or look through /sys and /proc to find device major and minor numbers or to modify kernel parameters or files that are mistakenly given write access.
The overwriting of the runc executable CVE that recently happened will give an actual case there.
Namespaces are more about decoupling and avoiding pollution than security.
Just like chroot, the shared kernel instance has a large attack surface, especially if you don't leverage all of the tools provided.
As you are effectively running arbitrary code from users, I would highly suggest you look into non container runtime protection.
It can be made reasonably safe but an overconfidence in containers being inherently secure will make you a target.
If you are on k8s you should be using anti-affinity or taints to make sure containers running external user code is not running on the same nodes as other containers or better than that have a dedicated k8s for that need.
Especially if you have persistent storage as user mount point namespaces are new in the kernel and default mounts typically are implemented by granting CAP_SYS_ADMIN capabilities(7)
While (user) namespaces, root-less and cgroups do slightly reduce the attack surface, they are still running on a shared kernel instance.
In Dockers default configuration, using host namespaces and allowing --privlaged, anyone who can launch a container has full root level access to read disks of even the host machine via mknod or even update firmware. Lets hope that they are not using Linux bridges for containers too.
The belief that containers are somehow ultra secure will result in many breaches in the future. In theory if you have control of the code SELinux or Apparmor could help but most people don't use them and a major cloud providers solution doesn't even support them.
It is scary how many install bases even add capabilities to the container daemons so that they can run some form of storage persistence etc...
The risk of containers can be mitigated to an acceptable layer. But when ever I hear a company claiming that they are using containers because they are "secure" it is a huge red flag.
If you are a company making claims like the above, you are proclaiming that you either think that security through obscurity is a primary cybersecurity practice or you really don't understand how containers and namespaces work.