Hacker News new | ask | show | jobs
by jasonrichardsmi 1580 days ago
Hi danpalmer, I kind of hit on this in the article. How do you think abstracting away the server adds any value in "public cloud"? Where you can get bespoke VMs, with no concern for the underlying hardware? Can you elaborate?
7 comments

There are several layers of abstraction here, all bringing benefits.

Layer 1: bare metal servers, bare metal routers, hardware appliances, etc.

Layer 2: virtual servers, VLANs/security groups/etc, some appliances.

Layer 3: containers, container networks, storage resources, etc. Kubernetes, arguably Heroku.

Layer 4: functions as a service?

In the article the suggestion, or how I read it, is essentially "Kubernetes is nothing new, we've had [layer 2] for years". However I see Kubernetes as sitting squarely in "layer 3" here.

These layers are pretty subjective, all of the boundaries are blurred, but when I'm working these do require fairly different skills sets. I've never had the skillset to work at "1", I was fairly good at "2" for a while, but spent too much of my time working on that and too little shipping software, so my workplace moved to Kubernetes and it allowed me as an application engineer to do much less infra work, and for it to be at "3" when necessary.

To just write off everything in "3" as being "virtualisation" is ignoring the significant step in level of abstractions and the benefits brought by that.

Process boundaries do imply a kind of virtualization, OP is not wrong there. What containers add as a feature though is comprehensive namespacing for the resources that the OS manages on behalf of "virtualized" processes.
But the initial question was how it differs from public cloud. This is not a difference. You can define your kubernetes or your terraform and have whichever brand of logical isolation you prefer
I can tell you what it does for me: Full disk on your VM? Nope. Storage is abstracted away. Your server will not fill up anymore. Only one service might break which might heal itself.

Is your VM/node broken? It will heal itself because you throw it away and the new VM/node is fixed.

It enforces the abstraction of Service and VM. You will not install normal software on that VM just because you can. You don't need to give access to a VM to a developer who then needs root access and has dependencies and doesn't update the VM.

You no longer have dependencies to your VM because you can't have dependencies on your VM OS.

Abstracting it away from your VM also streamlines things like logfiles. You no longer need to collect all logfiles from VMs because you do it for your services and for your services, you only do it once (if even, log to stdout and be done with it)

None of this is done by k8s. Network storage, hardware virtualization, network virtualization? The all existed before.
None of this is true.
I'm describing my real life issues i have and had.

Feel free to actually write more than 'None of this is true.' in a way that a discussion is actually possible.

Tx :)

I run k8s on bare-metal, and I can say a full disk is certainly possible if you have a service logging a few mb/s. Things will break in fun and interesting ways, data will get irrecoverably corrupted, etc. Your entire cluster will probably even break if said node was the etcd leader. This is pretty easy to reproduce by simply saturating a network and then watching the etcd leader spill its guts in your logs once the network buffers fill up.

> You no longer have dependencies to your VM because you can't have dependencies on your VM OS.

Your containers rely on the OS's kernel and whatever features it was compiled with.

> You will not install normal software on that VM just because you can

If you're paying through the nose for managed k8s, this is true. If not, you'll eventually need to login to a node and diagnose some issue, which means installing things on the node.

> You no longer need to collect all logfiles from VMs because you do it for your services and for your services

Whatever you installed to collect logfiles is getting them from the VM's disk (in /var/log/pods in k3s), unless your container is redirecting them somewhere that isn't stdout.

> If you're paying through the nose for managed k8s, this is true. If not, you'll eventually need to login to a node and diagnose some issue, which means installing things on the node.

Managed Kubernetes on Amazon (EKS) is quite inexpensive: $0.10/hr * 24 hrs/day * 30 days/month = $72/month. Other costs are VMs, networking, and storage, which you would have allocated anyway. There are some downsides like forced upgrades, but cost is not of them for our use cases.

We incidentally don't ever login to Kubernetes nodes using tools like ssh. It's asking for security trouble to have those ports open.

It’s all true though.
With Kubernetes you don't have to configure log exfiltration, process management, SSH, host metrics, etc. You don't have to touch Ansible--there's no host management at all.

The stuff that you still have to configure (e.g., firewalls, NFS) is all configured through a consistent, declarative interface (Kubernetes manifests) rather than a dozen bespoke, byzantine formats or imperative commands.

Kubernetes is not quite that easy though. Out of the box, you get basically no isolation between anything, and you still have to deal with security contexts and have processes in place for keeping your container images secure. If you use community Helm charts your services may end up running with essentially random privileges that may easily conflict.

The declarative interface is going on the right direction (as far as yaml can be) but configuration management for it is still unsolved. Backups are also often forgotten; they're very easy with virtual machines.

I suspect you may be confusing "cloud provider Kubernetes" (the topic at hand) with running your own Kubernetes on bare metal. The bare metal Kubernetes story still has a long way to go, but we're talking about public cloud providers.

> Out of the box, you get basically no isolation between anything

I'm pretty sure AWS Fargate and GCP's GVisor solve (or attempt to solve) isolation. Not sure about other cloud providers.

> you still have to deal with security contexts and have processes in place for keeping your container images secure

How do VMs help secure software artifacts beyond the security practices in the container ecosystem? And I would argue that "dealing with security contexts" is strictly better in Kubernetes than the equivalent in VMs if only because of the unified interface (Kubernetes manifests).

> If you use community Helm charts your services may end up running with essentially random privileges that may easily conflict.

You can run into the same issue with Ansible scripts on VMs. This isn't a Kubernetes specific issue--ultimately, all system administrators need to take care to run secure software on their systems. Neither Kubernetes nor VMs offer a silver bullet here.

> configuration management for it is still unsolved

If "configuration management" refers to configuration of the hosts, then yes, public cloud provider Kubernetes offerings solve for this--you don't have to manage the host configuration at all (unless you want to opt into it).

> Backups are also often forgotten; they're very easy with virtual machines.

The etcd backups are managed by the cloud providers, as are backups for mounted volumes. Not sure what backups you're thinking about.

They are as easy on k8s as they are on VMs.

Or 'can':

If you use a VM on AWS, you also need to know that you need to configure a vm snapshot (very easy, totally agreeing here with you).

But you can also use a managed k8s from AWS which you can also backup as they are all on PV and they have snapshotfeatures.

I don't want to compare a VM + Snapshotting 1:1 with kubernetes though. It wouldn't be fair to k8s and it wouldn't be fair for all usecases which work very very well on one VM.

It's not just "abstracting the server". Kubernetes abstracts more than just "a server", it works on a level higher. It does this for storage, networking, compute, services, workloads, scaling, ... and all this is done through a standardised API. This APIforces you to standardise application deployments, making centrally managed logging, monitoring, tracing, ... a breeze. Once you have it working for one application, it'll work for all of them.

And do you need to run this on Amazon, Gcloud, Azure,one of the smaller cloud providers like DigitalOcean or locally on Kind/k3s? It'll require very little work to get them working on any of these - if any. Cloud specific services and persistent storage will be the main issues, but that's something you can't really get around.

Now is it perfect? Absolutely not, as with any tech, there will always be problems and bottlenecks. But it allows development to scale, not just the workloads, and the skills required are transferable, which makes it a much easier sell.

I completely agree with everything you said, except:

> Cloud specific services and persistent storage will be the main issues, but that's something you can't really get around.

That isn't wrong necessarily, but products like OpenShift Container Storage (OpenShift Data Foundations now actually) can provide a common API to erase that problem. ODF uses Ceph under the hood so you can get block, file, and (s3 compatible) object storage no matter where you are.

Cloud specific services are indeed a problem, but many of them have open source/portable solutions that you an choose that can run everywhere. Such as Fission, RabbitMQ, Kafka (not my favorite), Argo CD, etc. Really the things I run into most now are things like AWS machine learning services.

You don’t really have no concern for the underlying hardware. You pick and choose the CPU/GPU horsepower, memory, storage type and storage class, network transfer speed, and many other things.

It’s at the point of needing to scale horizontally where I begin to disagree with your premise. This is where you’ll typically get into proprietary and/or ugly offerings.

>You pick and choose the CPU/GPU horsepower, memory, storage type and storage class,

You still do with kubernetes

> network transfer speed, and many other things.

No, you don't. There's no slider for "network performance" on GCP, Azure or AWS.

>No, you don't. There's no slider for "network performance" on GCP, Azure or AWS.

Who said there was a slider? Network performance is one of the key filters on every major cloud provider. Not every instance type has the same network performance.

>You still do with kubernetes

Sort of. It can be way more abstracted away. What's the minimum I need to run a node? Okay great, now run as many nodes as needed, when needed. To set that kind of thing up with bare metal on AWS, for example, would require getting into some proprietary offerings and/or absurd complexity.

I had to actually go and look up with our default instance type was for our cluster. It's a rather useless fact since it doesn't much matter compared to the number of active nodes. That's not true at all if you're directly managing VMs.

I'd absolutely never trade the complexity of kubernetes for the complexity of self-managing a horizontally scalable bare metal VM implementation. However, some people (obviously) disagree with that. To each their own.

Network performance is tied to the instance type on GCP, AWS and presumably Azure.
But there’s no slider. Typically you slide the instance size for other reasons.

Here’s the thing: either you care about it (and you can game the sizing of the instance) or you dont and you run kubernetes.

But you if you don’t care then it doesn’t matter. There’s no slider for you to care about. It is not extra overhead.

why is this slider important? You’ve picked something very arbitrary here
Uh. Because the parent said that kubernetes and VMs are different because "with VMs you have to configure things [..] like networking performance".

But you configure the exact same things as with VMs and kubernetes.

Network performance (as per OP) is not configurable on either.

You just accept whatever accidental default you happen to have, it's not a conscious decision people are making, and it's an awkward assumption to say that you have to think about it.

Because if you have to think about it: that doesn't go away with kubernetes anyway: if anything it probably gets worse.

There might not be on GCP, but there are on other providers (alibaba cloud comes to mind)
The principled way of "abstracting away the server" is in fact namespacing of all OS-managed resources ala containers. This opens up possibilities like automated process checkpointing and migration, or even seamless vertical scaling to a multi-node cluster (as opposed to a single server node) via a SSI (single system image) environment.
Horizontal scaling of systems is not a way that kubernetes or containerization benefits over the cloud. Resource utilization on those horizontally scaled workers can be, however.
Wouldn't your example be horizontal scaling?
I think part of the confusion is that server can refer to both a physical box and a VM. Abstracting away servers is about not having to think about VMs. You have a service which requires some amount of compute, memory, and storage and you just want that service to run. There is value to not having to worry about provisioning a VM or administering it.
But you still have to do that in Kubernetes, unless you are running Fargate. Someone has to provision and maintain that machine, and in the process introduce a ton of administrative overhead.
That's true, but not a particularly interesting fact.

For example, if you use GKE (Google Cloud's K8s offering), you attach your K8s cluster to an auto-scaling node pool and it handles (de)provisioning of your VMs for you. You essentially don't care about the VMs, there's essentially no overhead.

If you are in a private cloud, this also creates a good "API boundary" between the team responsible for running hardware, and the team responsible for shipping software to run on that hardware. On the former side you can essentially just adopt a machine into the cluster and leave it, and on the latter side K8s lets you programmatically reference resources, but you don't need to know how/where they came from.

I would say it is an interesting fact, since it's this "good 'API boundary'" that, as you said, enables one to separate concerns, be it between different teams in an organisation or between a service provider and its users.

Yes, you don't need Kubernetes to come up with your own implicit or explicit API boundaries, and these might not be needed for smaller projects. I agree that Kubernetes is often used where it's not strictly needed.

There are things which strict abstraction, and with it, separation of concerns provide. The crucial point is that certain things are enforced.

Sorry, I meant not an interesting fact in public clouds where they take care of all the VMs for you, even without Fargate.

You're right though, the abstraction layer is very interesting for _enabling_ public clouds, and for private cloud team ownership.

Its not the same. You can easily run node pools automatically because your abstraction layer is k8s with containerd or docker.

You also know that you can throw away VMs because they don't contain any state. You are not losing data just because you kill a VM or a VM breaks.

It is way easier to just spin up n nodes and provision them all equally than whatever you did before.

In my team, we can manage way way way more nodes than we ever could. We spin up 100 VMs and destroy them on a regular basis automatically.

Gardener for example supports autoscaling on bare metal. The whole ecosystem is providing tons of great options.

I think there is a healthy dose of anti-vm bias in your viewpoints.

Lauding Kubernetes because ops work took too much of your time is just shifting your burden elsewhere, even if that means paying a bit more for an offering like ECS fargate.

Any environment with configuration management can treat instances as ephemeral. It’s a best practice.

I view docker more as a package manager, no more dependency hell.

In any event K8s is sprawling, it will soon be too complex for its own good. Assuming it’s not already.

I only answered the question why it is different with kubernetes.

I dont have anything against VMs. Feel free to click yourself a VM on any cloud provider, use it however you like.

K8s abstracts VMs away and i have and had real issues with maintaining VMs. Docker filling up the node with logs. Unable to upgrade the BaseOS due to python dependencies. Managing the same VM stack through ansible and everything ansible or chef brings to the table.

There have been plenty of self healing mechanism in place which do solve unfortunate issues. Memory? The service restarted, was offline for 3 minutes and is now working again. Node disk full? Pods get scheduled away, new node comes up, done. Update/upgrade of nodes? Nodepool does it for me.

For me, k8s has 2 real issues like memory (swap support is wip finally!) and stateful workloads like a database. But the concept of an operator shows a bright future.

k8s also does one thing very nice: It enforces certain aspects which are a pain in the ass later. That VM which wasn't updated for years and run just fine? Now there is an issue and it needs to be fixed asap. But now the debian repositories are no longer available. I have to fix apt srce list first, then i need to fix dependencies and then i need to restart it.

That's a PAAS and they existed long before k8
A good mental model is to think of K8s as a portable PaaS with a more well-defined API. That's a good thing, not a criticism of K8s.