I'm wondering though what value will Kubernetes add beside integrating with existing (presumably Kubernetes-based) infrastructure? At least, this is my understanding of the rationale for Kata containers. Other than that, it seems like it'd be just getting in the way...
I believe this work originated at Intel as "clear containers" (which I believe started life from an acquisition (but could be mixing this up...my memory isn't what it used to be). Either way it's great they are being used like this and at Nvidia (I know Alibaba cloud also use this tech)
Yes, Kata started as clear containers. And yes, the main purpose is compatibility with containers -- though generally speaking, adding layers to the cloud stack never helps to make a deployment more efficient. On kraft.cloud we use Dockerfiles to specify app/filesystem, but then at deploy time automatically and transparently convert that to a specialized VM/unikernel for best performance.
Back when we did the paper, Firecracker wasn't mainstream so we ended up doing a (much hackier) version of a fast VMM by modifying's Xen's VMM; but yeah, a few millis was totally feasible back then, and still now (the evolution of that paper is Unikraft, a LF OSS project at www.unikraft.org).
(Cold) boot times are determined by a chain of components, including (1) the controller (eg, k8s/Borg), (2) the VMM (Firecracker, QEMU, Cloud Hypervisor), (3) the VM's OS (e.g., Linux, Windows, etc), (4) any initialization of processes, libs, etc and finally (5) the app itself.
With Unikraft we build extremely specialized VMs (unikernels) in order to minimize the overhead of (3) and (4). On KraftCloud, which leverages Unikraft/unikernels, we additionally use a custom controller to optimize (1) and Firecracker to optimize (2). What's left is (5), the app, which hopefully the developers can optimize if needed.
LightVM is stating a VM creation of 2.3ms while Firecracker states 125ms of time from VM creation to a working user space. So this comparing apples and oranges.
I know it's cool to talk about these insane numbers, but from what I can tell people have AWS lambdas that boot slower than this to the point where people send warmup calls just to be sure. What exactly warrants the ability to start a VM this quickly?
The 125ms is using Linux. Using a unikernel and tweaking Firecracker a bit (on KraftCloud) we can get, for example, 20 millis cold starts for NGINX, and have features on the way to reduce this further.
I'm wondering though what value will Kubernetes add beside integrating with existing (presumably Kubernetes-based) infrastructure? At least, this is my understanding of the rationale for Kata containers. Other than that, it seems like it'd be just getting in the way...