Unhygienic string templating of a whitespace-sensitive language is a truly new hell for those who have not experienced it before. Quite an astonishing decision.
I remember being blown away by this. Unhygienic templating and lack of template and serialization boundaries has been such a consistent disaster in our field for so many decades that it is hard to believe we are still dealing with this kind of design error in relatively new technology.
My feeling is we will probably all look back at the over-use of text based DSLs and configuration languages as a giant mistake. Not just with respect to K8s, but IaC, CI/CD config and the rest of the DevOps YAML/config language mess. In retrospect, it has been a case of simplistic instead of simple. "Declarative" config languages are hello world optimized, and what looked great at the beginning of the S-curve is starting to look pretty damn bad.
You can with Pulumi, for example. I've only used it for provisioning a Digital Ocean droplet before, not for Kubernetes, but it was relatively painless. (Although I do find approximately one new panic in Pulumi every time I upgrade to a new version.)
Indeed, as someone who maintains a helm package it's mind-boggling. When I've been able to build k8s tooling from scratch, I've been reasonably happy with jsonnet [0], which is a constrained programming language designed for configuration. It has the property that it will always produce valid JSON.
Seconded on jsonnet. I’ve started using it to generate json (that usually goes to an API) in gitlab ci pipelines. I use it to merge “gold standard” boilerplate configs with user changes.
JS and TS aren't configuration languages. If it wasn't for JS's early dominance and too-big-to-fail status on the web, we probably wouldn't use it for programming much. Platform and operations engineers often don't know JS well, favoring bash, python, and others for programming. Some exceptions exist, like full stack node shops.
Configuration languages like yaml, HCL, etc. are more reasonable alternatives.
yaml is a tree serialization format with some human-targeted ergonomic features like comments and multiline strings built in.
it'll work for simple configuration files just as well as format-less .ini files will. for complex configurations, even xml is better, and that's saying a lot.
in practice for object graphs of any sort of complexity, like cloudformation or k8s configs, you want a programming language which can reduce the kolmogorov complexity of your configuration, because that dominates ops in the limit. or, IOW, configuration is code is configuration is code is ...
go and jinja "templates" have always been ever so much fun. It depends on the complexity stored in said "configs"; what reads the configs and what writes to the configs.
as a human i want to be able to both read and write, easiliy, by knowing wtf is going on for the input(s), and to understand clearly within the config-at-hand, what does what ect (without 5x paragraphs per 1 key-value tuple).
then insert more templates to generate more templates, and then it is a fun spiral
k8s reminds me of those 20min long commercials combined with the energizer bunny, "but wait there's more" .. "and there's more" .. "and there's more" - as with orcrastrating infrastructure as microservices, there's always more and more complexity to add to the monster. More layers, secutity, networking, ect, ect..
well, there's more, because operating software is exactly like that and k8s tries to API-fy that, in my opinion with considerable success.
yes, the actual descriptors are atrociously hideous, but it's okay, it's low-level, evolving pretty quickly, and there are nice high-level representations -- https://cdk8s.io/docs/latest/plus/
so, yes, of course, compared to FTP copying PHP files into cgi-bin k8s is more complex, but the feature set is also different.
of course, not everyone needs declarative gitops-based blue-green deployment with pristine dev/demo/staging/UAT envs on each new PR. and usually when people think they do it's mostly just FAANG envy :)
but, speaking from experience, setting up a k3s cluster is easy, cheap, and deploying things on it with a "kubectl apply" is also easy. setting up CronJobs to do backups to some S3-compatible thing is also quite doable, and so on. and you end up with a big bag of YAML. is it better than snapshotting a VM? who knows!
Thanks, that little twitch under my eye just started up.
I worked at an org where an admin went a little rogue and updated the config without using the macro file. Another admin didn't realize. Well, didn't realize until it was too late.
It's not really astonishing. People don't usually have knowledge beyond a quick google search. If you google XML, you'll see a lot of negative sentiment about XML specifically related to mid-00s Java frameworks and some people chunk this fact as "XML old and bad". If you google YAML, you'll see that lots of people like using it for relatively simple things (Rails config files...) and some people will chunk that fact as "YAML good".
For me, one of the true advancements of JSON/YAML over XML is how its features are orthogonal to each other
I don't have to ask "should this be a tag property, or a subtag"?
Which, to your point, isn't to say that YAML is "good", but I think there was at least an advancement through minimalism.
Right tool for the job. I think XML is too much tool for most jobs, but there's a huge gap between JSON/YAML and XML, so in 99% of cases I'd rather err on the "too much" side.
defo agree, mate. if data that is consumed by computer is really crucial, I'd better be sure to use any format that has clear AST, schema, or whatever ya call it. now I'd rather use xml / clojure's data sexpression / even json over yaml. it's just a gut feeling, in the most part.
YAML and other human-friendly presentation/editor formats are cool, but they need to come with a schema/validation/types. k8s APIs have a bit of validation, but there are frustrating gaps.
That has been one of the most annoying aspects of dealing with helm templates. It's like that USB plug you never can insert the right side up, only it has like a half-dozen sides and to cycle through them all to figure out the right one.
I don't think I'll ever understand why YAML was chosen as the language of choice for all things devops, and why so many startups have sought to augment yaml with syntactical hacks
I mean, as soon as you template it and need to do `{{ something | indent 4 }}` or some shit to make the template work you know you're on a bad track.
I don't really think YAML is the problem, it's the string-based templating. I'd like to see Emrichen or something like it become more common. And Emrichen is format-agnostic, you can write your stuff in json or YAML and it looks and works pretty similarly. (Although I stick to yaml.)
100%. Target unaware templating is always a mistake. It is unfortunately common in our industry because it is easy and works most of the time. But this is also why SQL injections and XSS are one of the most common vulnerability. SQL injections are getting better because people are more often using parameterized queries which never need to actually encode the values into the "template" and XSS is getting better because most big frameworks now have target-aware templating that properly serializes values. But these are both hugely common issues. Look up a tutorial about how to use SQL or make your first website and more likely than not you will see examples that have vulnerabilities. Our whole industry is teaching the wrong thing by default, then hoping to fix it later.
String concatenation may have been a mistake. The concept of a string may have been a mistake. Every sequence of bytes has some structure, and in order to mash two "strings" together they need to be serialized in the correct way. Even when building error messages it would be nice if you can reliably identify the "chrome" from the "content".
Also look at terminal escape sequences. When we print text to a terminal we should probably be replacing non-printable characters with some sort of encoding so that the reader can understand that 1. these are from the content not the application and 2. Not do stuff like delete the output line to trick the reader.
Every time you put two strings together you should think about how you need to properly encode one into the other. Output unaware text templating almost always fails this because it relies on the user to do this for every single interpolation, and that is doomed to fail.
It's "declarative" which is supposed to be good, and you can represent complex data structures without any annoying closing braces (like json) or tags (xml).
Plus, well, there's a disturbingly high number of people who think that semantic whitespace is a positive.
This is ridiculous. Indentation is a visual concern for humans and braces are a semantic concern for a machine. Code only needs the latter to actually execute.
You can prefer them be coupled for taste reasons but they are fundamentally different. Code minification is a practical example where you can save many characters by omitting whitespace.
This is like tabs vs spaces. The white spaces crowd prefer the trade offs, and the tab crowd don't. But the white space crowd can't argue the perks of tabs - they exist.
The tabs/spaces war was fought because of IDE choices that restricted your ability for sane tab rendering defaults, and simple overriding of tab size.
Now that terrible text editors won, I have to watch a 2 Vs 4 space war.
If only we could have a character that could represent indentation and people could set the rendering so they can visualise it in their own preferred way.
And folks who want to remove braces should be pointed to the Apple certificate snafu.
These optimizations for some perceived ergonomic win almost always make terrible tradeoffs versus using a good well established data format. And especially systems which favor human consumption but create extreme difficulties for machine handling, those are the worst!
Yaml being such a non-Context Free Grammar is a huge pain. There's so much state in the parser. It only gets worse from there. Yaml has all kinds of wild crazy capabilities. References, a variety of inline content blocks, and weird ways to invoke stuff?? GitHub yesterday did a code review of Frigate, an enormously popular surveillance video analysis tool that's heavily downloaded, and found, oh yes, a huge glaring yaml bug allowing remote execution, because executing arbitrary code is just built right in to yaml amid 3000 other crazy hacks & who would have known to go look for & disable that capability?! https://news.ycombinator.com/item?id=38630295https://github.blog/2023-12-13-securing-our-home-labs-frigat...
Typing is not the problem (even though I see so many people just terrible beyond words at navigating project structures or the command line... Improve! Some day!).
I do think there's a power to the readability that makes it more approachable (but which eventually burns you). We were rewriting our AST at Kurtosis last year, and the default choice we were going to go with was of course YAML. But we came across a Github issue from DroneCI (who also started with YAML) that said something like, "we started with YAML, and we learned that you always eventually want to add more complex lotic on top. Go with a Turing-complete language to begin with, else you'll be in the CircleCI trap inventing a language via YAML DSL."
We decided to go with Starlark as the base language for our DSL, and we've had a consistently great experience. Users report that it's very approachable, and the starlark-go library is very pleaeant to deal with.
I totally agree that string-templating into a data serialization format is a mistake. But you can make life dramatically easier on yourself by doing `{{ something | toJson }}`. In fact write a linter that every single substitution is followed by `| toJson` and you will save yourself a lot of headaches.
The main issue is that it make it more difficult to mix hardcoded and inserted values.
Also the small technical concern that YAML isn't actually a superset of JSON. (But you are far less likely to hit these cases than other escaping bugs).
Dedicated config languages are the best usually. Jsonnet/Cue/friends.
Failing that if you actually need/want a procedural/non-pure language then I think Kotlin or Ruby take the cake. Both have extremely strong support for DSLs which IMO is key to reaching a modicum of usability.
Starlark is nice as well, it’s syntactically based on Python, and behaves a lot like regular procedural languages, but it’s meant to provide a pure and safe environment for configuration and be embeddable.
Big +1. We switched to Starlark for our DSL last year and have been very pleased. Users who've never used Starlark before come in with some 'what, another language?' trepidation, and end up pleasantly surprised.
Since using Bazel a bit I have grown an appreciation for Starlark also. The big thing is that list and dict comprehensions are a really nice fit for these types of tasks.
Lua is fantastic for the config-as-code use case. Easy to read and write, with a lightweight embeddable interpreter, and it has almost universal library support across different languages/environments.
It's a shame that helm v3 didn't move forward with the lua engine[0]. I don't imagine ~=/1-based arrays were a worse timeline... And here we are 5 years later.
YAML itself is not too bad, especially with a good IDE. But YAML and its ws-sensitivity combined with Go templating is horrible. Every component by itself looks kinda reasonable, but when they come together it makes an unholy mess.
the best part is that Helm does some sort of preprocessing to strip out comments (i think) when templating and validating, so it will report a failure at a line number that doesn't actually correspond to the problem in the original source. tracking down template failures is infuriating because of this
My feeling is we will probably all look back at the over-use of text based DSLs and configuration languages as a giant mistake. Not just with respect to K8s, but IaC, CI/CD config and the rest of the DevOps YAML/config language mess. In retrospect, it has been a case of simplistic instead of simple. "Declarative" config languages are hello world optimized, and what looked great at the beginning of the S-curve is starting to look pretty damn bad.