Hacker News new | ask | show | jobs
by jrockway 2116 days ago
I don't think better abstraction is going to help the underlying problem: there are a lot of questions to answer, and you don't know the answer.

Without going into the complexity of Deployments, consider the lowly Pod. What configuration does your app need? What is the name of the container that contains it? How much memory does it use? How much CPU does it need? What ports does it listen on? What HTTP endpoint handles the health check? Does that endpoint test liveness or readiness? What filesystems does it need? What setup needs to be done before the main container runs? Does it need any special resources like GPUs? The list goes on.

The problem here is that when you're writing a Pod spec, you're building a single-purpose computer from scratch. In the traditional UNIX world, people answered most of these questions for you. How much RAM can my app use? However much I plugged in. How much CPU can my app use? All of them. What ports does it listen on? Any of them from 1024-65535. What filesystems does it need? Whichever ones I setup in /etc/fstab.

I don't think it's a stretch to call UNIX's "yolo" approach problematic. It is great when you have one server running one app, but servers have gotten gigantic (with pricing to match) while applications have largely stayed the same size. This means you have to pack multiple apps onto one physical server, and to do that, there have to be rules. When you write a Kubernetes manifest, you are just answering every possible question upfront so that the entire system runs smoothly even if your individual component doesn't. It's the cost of having small apps on big computers.

The problem comes from applications that you didn't write, or don't fully understand. Before you can understand how the application behaves, you have to write a manifest. But you don't know the answers to the questions like how much CPU you're going to use, or what the worst case memory usage is, etc. This causes a lot of cognitive dissonance, because the entire file is you admitting to the computer that you have no idea how to configure it. No abstraction layer is going to fix that problem, except by hiding those uncomfortable details from you. (And you will always regret using the "yolo" defaults -- who hasn't tried to SSH into a broken server only to have Linux helpfully OOMKill sshd or your bash instance when you're just trying to kill your malfunctioning app.)

This is largely the fault of application developers. They aren't willing to commit to reasonable resource limits because they don't want to handle support requests that are related to underprovisioning. My experience is that applications that set limits pick them wrong. For example, GCP and DigitalOcean's managed Kubernetes offerings both install monitoring agents to support their dashboards; these apps ship with limits that are too low and any reasonable Prometheus installation will notice that they are being CPU throttled and warn you about it. Now you have to waste your day asking "is this a real problem?"

Many open-source apps go the other way and pick resource limits that truly encapsulate the worst case and require individual nodes that are many times larger than the entire cluster. Yes, it would be nice if I gave each pod 32 CPUs and 128GiB of RAM... but I don't want to pay $2000/month/replica thankyouverymuch. (I've been on the other side of that where resources didn't cost me real money and happily used terabytes of RAM as cache.)

Application-level configuration is also not in a great state. Everyone tries to sell you their curated defaults so they don't have to write any documentation beyond a "quick start". (I'm as guilty of that as anyone in fact!) The application will have some built-in defaults (so the developers writing the app can just "go run main.go" and get the config they need). Then someone comes along to make a Helm chart for you, and they change the defaults so that their local installation doesn't need any customization. This only causes problems because instead of an undocumented underlying application, now you have that AND an undocumented abstraction layer. You may find the answer to your question "how do I configure FooApp to bar?" but have no way of communicating that config through the Helm abstraction layer because the author of the Helm chart never thought anyone would do that.

This rant has gotten quite long so I'll wrap it up. No abstraction layer is ever going to make it so you don't need to answer difficult questions. The actual list of questions to answer is available through "kubectl explain pod.spec" and friends, however.

1 comments

In my opinion, Kube's yaml blobs are comparable to binary programs. Complaining that an executable has its settings, paths, ports, memory limits, etc, hard coded inside of it is an obvious code smell.

Tools like Helm solve this problem.

Abstractions have a cost and a value. Programming languages are more expressive than writing out binary opcodes by hand, but don't really remove any features from the underlying machine. If your machine can do something, your C program can probably do something. So we use C. (It even ADDS the value that you can describe a procedure without knowledge of the underlying hardware. It may not be optimal on every machine, but at least you have a starting point for optimization!)

Helm, like any abstraction, has costs and values. For example, it's very valuable to be able to encode something like "my app consists of a frontend and a backend that run in the same Pod and must each use the same version of the code":

    apiVersion: v1
    type: Pod
    metadata:
      name: foo
    spec:
      containers:
        - name: frontend
          image: my.registry.io/foo-frontend:{{ .tag }}
        - name: backend
          image: my.registry.io/foo-backend:{{ .tag }}
Now your frontend and backend containers can't get out of sync, and it saves someone from having to manually ensure that they are sync'd. There's no way to do it wrong! You supply {{ tag }} and it remembers the constraint that you required.

The problem with this abstraction is that there's no escape hatch. If you wanted to run different versions of foo-frontend and foo-backend, there is no way to say "but no, really, this time I'm violating the rules for a good reason". You've reduced the features available... the only way forward is to start over with nothing.

The result of this is that every individual Helm chart has to account for every possibility that the manifests could possibly encode, and invent their own programming language that is identical in expressiveness to the underlying manifests. And they do! Differently every time! For example, if I wanted to allow people to override the container images, I'd have to make my template look like:

    containers:
      - name: frontend
        {{ if .tag }}
        image: my.registry.io/foo-frontned:{{ .tag }}
        {{ else }}
        image: {{ .frontend_image }}
        {{ end }}
      - ...
Now it's possible, but not in a way that anyone could search for on the Internet. You will have to read the code or hope I wrote documentation for my ad-hoc Kubernetes extension.

I think we can all agree that didn't save anyone any time or effort.

This example conveniently flows into my other complaint with Helm. The "yaml files" that declare templates aren't actually valid YAML. You can't use something like `prettier` to autoformat them. You can't use the YAML language server to provide code advice as you type. You can't use `kubeval` to validate them. You are throwing all of that away to Build Your Own Thing. It is actually very insidious and for that reason I consider Helm to be more harmful than helpful. It isn't an abstraction, it's just a macro that might be good for one person the instant they happened to type something in.

The other problem is that Helm charts have no upgrade path. They are only designed for "please explode this project into my cluster, I promise to clean it up In The Future". It never gets cleaned up and brings a little piece of un-updated Windows 95 nostalgia right into your cluster.

Helm is actively harmful. And people love it, because it saves them a tiny bit of time one day at the cost of a lot of time in the future.

Templating within yaml is an absolute mess, I harped on ansible in the past for the same reason. However, dealing with that once is far, far better than dealing with thousands of hand-maintained kube yamls multiplied by your number of environments. Helm of course has its own pain points, but that doesn't subtract from the fact that it solves problems.

I see putting your example of two, unrelated, containers in the same pod as the same binary problem I mentioned earlier. I get your point, but it's a scenario one must wedge themselves into by making other poor choices. Why must the frontend and backend be the exact same version? The most obvious possible reason could be that the API used between them isn't versioned. Or maybe there's not even an API!

> You will have to read the code or hope I wrote documentation for my ad-hoc Kubernetes extension.

No, helm's approach to inserting variables into templates means this isn't the case. Every option and default appears in values.yaml and it's a one stop shop to see everything you can customize. The code example you wrote could be better written as:

      containers:
        - name: frontend
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
With values.yaml having

    image:
      repository: my_default_image
      tag: my_default_tag
Note that values.yaml is part of the template and values passed to helm's cli can override individual values in it.

I'm not sure what you mean about helm charts having no upgrade path. IME you can un-deploy, upgrade, and rollback helm deployments and it takes care of adding/removing kube resources that where also added or removed in the yaml for you. [1]

[1] https://helm.sh/docs/helm/helm_rollback/