Hacker News new | ask | show | jobs
by benpacker 917 days ago
Am I understanding correctly that because they map a “Pod” to a “Fly Machine”, there’s no intermediate “Node” concept?

If so, this is very attractive. When using GKS, we had to do a lot of work to get our Node utilization (the percent of resources we had reserve on a VM actually occupied by pods) to be higher than 50%.

Curios what happens when you run “kubectl get nodes” - does it lie to you, or call each region one Node?

7 comments

GKE Autopilot is an attractive option here if you don't want to worry about node utilization and provisioning. Effectively you have an on-demand infinitely-sized k8s cluster that scales up and down as you need new pods. Some caveats, but it's an incredible onramp if you're coming from a Heroku or similar PaaS and don't want to worry about the infrastructure side of things: Github Actions building images and deploying a Helm chart to GKE Autopilot is a remarkable friendly yet customizable stack. Google should absolutely promote it more than it does. https://cloud.google.com/kubernetes-engine/docs/concepts/aut...
Unfortunately last I checked the compute pricing for GKE autopilot was almost double, so if you can beat 50% utilization, you might as well just keep the under-utilized Node around.
If this is “free GKE autopilot” (autopilot billed at the same price as regular Fly Machine compute), then that changes the way I think about Fly’s basic compute pricing a lot.

I would think they should highlight that a lot more in the product announcement!

Say more! What should we highlight more?
As someone not familiar with Fly's offering (but very interested for the same reasons as the post you're replying to!), a couple things come to mind if you're looking at convincing people familiar with k8s to move workloads here:

- https://fly.io/docs/ doesn't show any results when searching kubernetes or k8s or k3s.

- https://fly.io/blog/fks/ is self-admittedly snarky but also doesn't provide details about the product itself. It jumps straight into technical details - and while I like the openness about fault tolerance, there's no paragraph after the intro about what Fly Kubernetes is.

- What exactly does the combination of k3s and virtual-kubelet provide compared to standard k8s? Does it provide secret and confmap storage and namespaces and all those expected things? Can we run things like the Kubernetes dashboard? cert-manager? nginx-ingress?

- On that note, what's the ingress story in general? Is Fly automatically routing traffic to the k8s cluster based on the ingress declarations? Are there limitations? Where are they documented?

- Most people running k8s will have fault-tolerant workloads, but reasonable expectations for pod lifetime and reliability of underlying "hardware" are nonetheless important. If I'm migrating from EKS or GKE and want to run a 24/7 background process, can I expect it to keep running on the same Fly Machine for weeks or months until updated? Or are there limits here? (This might be better documented for Fly Machine but it's worth documenting specifically in this context.)

Absolutely understand that this is an experimental work in progress. It's really cool work! But it's also impossible to even justify playing with as an experiment, with so many unanswered questions about where hard caps in the functionality may be hit.

If I use GKE or any other standard Kubernetes offering (excluding GKE autopilot for now), if I have a variable workload and I want Node-level autoscaling, I will probably pay between 1.5x-2.5x in compute costs above what my Pod requests sum to because of difficulty with Node utilization.

It seems like with FKS, my pods will map directly to Fly Machines billing, and so there’s no compute that I’m paying for but not using

GKE Autopilot is pretty much useless, very few cases where it actually turns out cheaper than simply using Cluster Autoscaler + Node autoprovisioning. Not only is the pricing absolutely absurd, they don't even allow normal K8s bursting behavior (requests need to be equal to limits) which means you not only end up paying more than regular K8s cluster but now also need to highly overprovision your pods
Why would you use GKE Autopilot over Cloud Run?
Cloud Run is great if you just need to deploy a few services and expose their endpoints, and don't have a particularly complex backend service architecture.

But with more complex architectures, you'll end up implementing a sort of GKE-like layer over Cloud Run, at which point GKE would probably make more sense.

GKE lets you shell into containers, run all different kinds of workloads (e.g. no need for a separate "Cloud Tasks" system), supports stateful workloads, provides a standardized language for defining and deploying resources of all kinds (the k8s resource definition language), and as such integrates with standard gitops deployment systems such as ArgoCD.

My understanding is that Cloud Run is not suitable for stateful workloads (databases, etc.)
The node would be a virtual-kubelet. You can check out the virtual-kubelet GitHub repo for more info.

Interestingly, there are already multiple providers of virtual-kubelet. For example, Azure AKS has virtual nodes where pods are Azure Container Instances. There’s even a Nomad provider.

> So that’s what we do. When you create a cluster, we run K3s and the Virtual Kubelet on a single Fly Machine.

So probably a cluster per region. You could theoretically spin up multiple virtual-kubelets though and configure each one as a specific region.

> Because of kine, K3s can manage multiple servers, but also gracefully runs on a single server, without distributed state.

This would mean the control-plane would be on a single-server without high-availability? Although, I suppose there really isn’t any state stored since they are just proxying requests to the Fly Machine API. But still, if the machine went down your kubectl commands wouldn’t work.

The diagram on https://virtual-kubelet.io/docs/architecture/ makes me wonder whether it's possible to have a k8s cluster where the nodes are all virtual kubelets backed by different cloud providers (and then perhaps schedule loads preferentially with selectors)
I think it’s completely possible. Though, you’ll have to manage your own control-plane.

Azure AKS and EKS provide virtual-kubelet functionality in some form, but AKS is an a managed control-plane where you can’t add nodes yourself and EKS only allows nodes in the same VPC.

Edit: It already is a thing. https://github.com/virtual-kubelet/tensile-kube

tensile-kube seems to be structured as a "k8s cluster of k8s clusters", with an upper kubemaster farming out resources to lower kubemasters (through virtual-node). I don't know if there's any particular reason to have that separation; possibly the lower kubemasters could be removed and you could just run a bunch of virtual-kubelets.
I think the biggest hurdle would be networking between the pods since they will be running on different cloud providers.
I've seen some people using wireguard for intra-cluster networking so that all their nodes can run pretty much anywhere.
Wouldn't the network cost be absurd in such case? Not only the pod-to-pod communication cost skyrocket, all the heartbeats, health checks, metrics, daemonsets pinging each other will probably end up costing more than the CPU and Memory
> Had to do a lot of work to get node utilization ... higher than 50%

How is this the schedulers fault? Is this not just your resource requests being wildly off? Mapping directly to a "fly machine" just means your "fly machine" utilization will be low

I think there’s a slight misunderstanding - I’m referring to how much of a Node is being used by the Pods running on it, not how much of each Pod’s compute is being used by the software inside it.

Even if my Pods were perfectly sized, a large percent of the VMs running the Pod was underutilized because the Pods were poorly distributed across the Nodes

Is that really a problem in Cloud environments where you would typically use a Cluster Autoscaler? GKE has "optimize-utilization" profile or you could use a descheduler to binpack your nodes better
DX might be better I suppose, since you don’t have to fiddle with node sizing, cluster autoscalers, etc.

Someone else linked GKE Autopilot which manages all of that for you. So if you’re using GKE I don’t see much improvement, since you lose out on k8s features like persistent volumes and DaemonSets.

> we had to do a lot of work to get our Node utilization ... over 50%

Same, a while back you had to install cluster-autoscaler and set it to aggressive mode. GKE has this option now on setup, though I think anyone who's had to do this stuff knows that just using a cluster-autoscaler is never enough. I don't see this being different for any cluster and is more a consequence of your workloads and how they are partitioned (if not partitioning, you'll have real trouble getting high utilization)

I wonder how it copes with things like anti-affinity rules, where you don't want two things running on the same physical / virtual server for resilience reasons.
You wouldn’t use affinity rules anymore. The pods are scheduled on a single virtual-kubelet node, so if you use anti-affinity scheduling would fail.
> You wouldn’t use affinity rules anymore

Point being: what if I wanted to do this? How could I achieve making sure services were running according to the antiaffinity rules I provided? E.g. not on same physical machine; not on same VM; not in same datacentre; not in same region; etc.

If there were a virtual kubelet per unit of granularity (datacenter, in their case?) then you would be able to use affinity rules just fine.
Right. Though, the virtual-kublets can be running on the same machine actually. They just need to be configured to have different node names.

The press release states that your k8s API is actually running on a single machine with k3s and a virtual-kubelet. So, I’m not sure if it’s one “cluster” per region, or one “cluster” with multiple virtual-kubelets for regions.

Either way, your FKS cluster control-plane would sit in a single region.

How do you forbid running two instances of the same service on one node without anti affinity?
Traditionally, each node is its own machine. virtual-kubelet creates a virtual node that is a proxy to some other pod infrastructure. In the case with FKS, each pod in the virtual node is a machine (a node in the traditional sense), so it’s equivalent of having an anti-affinity on all pods with an infinite node pool.
if it is pod per vm, that would make it like EKS Fargate
is GKS some amalgamation of GKE and EKS
Typo haha - I meant GKE. Fixed now.