| > Whether it's kernel/OS or a bare metal hypervisor isn't really all that meaningful: exploits and vulnerabilities exist for either. This is just not true, or at least it's extremely disingenuous. Container isolation relies on the Linux kernel. Other than seccomp-denied syscalls (which aren't a thing in k8s by default) any program in the container has full access to the kernel. The Linux kernel has massive attack surface, especially to root users. VM isolation like Firecracker is much safer. The attack surface is considerably lower. For one thing, you can isolate the process in the guest just as well as you could outside, further limiting attack surface. But more importantly, an attacker either has to attack: 1. Firecracker 2. KVM Both are very small codebases. Firecracker is: 1. Written in Rust. 2. Sandboxed aggressively. KVM has basically never had a public guest to host breakout. You can read about one here, https://googleprojectzero.blogspot.com/2021/06/an-epyc-escap... So, to recap, we have "security boundary relies on a fully exposed Linux kernel" and "security boundary relies on hardened, tiny, security-driven programs". It is not even close. > There should be proper hardware-level isolation here, depending on the scenario. Most cloud companies can't afford that though, because they're not rolling out their own hardware. Hence hardware building hypervisor support in. |