Hacker News new | ask | show | jobs
by niz4ts 863 days ago
As far as I know, Fly uses Firecracker for their VMs. I've been following Firecracker for a while now (even using it in a project), and they don't support GPUs out of the box (and have no plan to support it [1]).

I'm curious to know how Fly figured their own GPU support with Firecracker. In the past they had some very detailed technical posts on how they achieved certain things, so I'm hoping we'll see one on their GPU support in the future!

[1]: https://github.com/firecracker-microvm/firecracker/issues/11...

1 comments

The simple spoiler is that the GPU machines use Cloud Hypervisor, not Firecracker.
There has been weirdly little discussion on HN about Cloud Hypervisor. I guess because it's such a horribly bland non-descriptive Enterprise Naming name?

It looks pretty sweet. Rust & sharing libraries with Firecracker and ChromeOS's crosvm, with more emphasis on long-running stateful services than in Firecracker.

https://github.com/cloud-hypervisor/cloud-hypervisor

https://github.com/rust-vmm

Unfortunately, Cloud Hypervisor does not use strong sandboxing/privilege separation like crosvm does.
For anyone else wanting to check on the status of this: it seems they're looking at a combination of seccomp, landlock and a systemd service instance per VM, with systemd doing DynamicUser, namespacing, and initial seccomp. Work seems to be happening right now, but of course it's telling and sad that it wasn't part of the original design.

https://github.com/cloud-hypervisor/cloud-hypervisor/issues/...

Way simpler than what I was expecting! Any notes to share about Cloud Hypervisor vs Firecracker operationally? I'm assuming the bulkier Cloud Hypervisor doesn't matter much compared to the latency of most GPU workloads.
They are operationally pretty much identical. In both cases, we drive them through a wrapper API server that's part of our orchestrator. Building the cloud-hypervisor wrapper took me all of about 2 hours.