| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Smaug123 924 days ago
	Unhygienic string templating of a whitespace-sensitive language is a truly new hell for those who have not experienced it before. Quite an astonishing decision.

9 comments

CipherThrowaway 923 days ago

I remember being blown away by this. Unhygienic templating and lack of template and serialization boundaries has been such a consistent disaster in our field for so many decades that it is hard to believe we are still dealing with this kind of design error in relatively new technology.

My feeling is we will probably all look back at the over-use of text based DSLs and configuration languages as a giant mistake. Not just with respect to K8s, but IaC, CI/CD config and the rest of the DevOps YAML/config language mess. In retrospect, it has been a case of simplistic instead of simple. "Declarative" config languages are hello world optimized, and what looked great at the beginning of the S-curve is starting to look pretty damn bad.

selfmodruntime 923 days ago

Yes! We are so close with the configuration as code paradigm. Why can‘t I configure K8S with python or javascript?

Smaug123 923 days ago

You can with Pulumi, for example. I've only used it for provisioning a Digital Ocean droplet before, not for Kubernetes, but it was relatively painless. (Although I do find approximately one new panic in Pulumi every time I upgrade to a new version.)

necubi 924 days ago

Indeed, as someone who maintains a helm package it's mind-boggling. When I've been able to build k8s tooling from scratch, I've been reasonably happy with jsonnet [0], which is a constrained programming language designed for configuration. It has the property that it will always produce valid JSON.

[0] https://jsonnet.org/

pram 924 days ago

Seconded on jsonnet. I’ve started using it to generate json (that usually goes to an API) in gitlab ci pipelines. I use it to merge “gold standard” boilerplate configs with user changes.

mati365 924 days ago

Why not just plain JS or TypeScript?

8organicbits 924 days ago

JS and TS aren't configuration languages. If it wasn't for JS's early dominance and too-big-to-fail status on the web, we probably wouldn't use it for programming much. Platform and operations engineers often don't know JS well, favoring bash, python, and others for programming. Some exceptions exist, like full stack node shops.

Configuration languages like yaml, HCL, etc. are more reasonable alternatives.

baq 924 days ago

yaml is not a configuration language!

yaml is a tree serialization format with some human-targeted ergonomic features like comments and multiline strings built in.

it'll work for simple configuration files just as well as format-less .ini files will. for complex configurations, even xml is better, and that's saying a lot.

in practice for object graphs of any sort of complexity, like cloudformation or k8s configs, you want a programming language which can reduce the kolmogorov complexity of your configuration, because that dominates ops in the limit. or, IOW, configuration is code is configuration is code is ...

pas 923 days ago

yes https://cdk8s.io/docs/latest/plus/

lambda_garden 923 days ago

Turing Complete and non-determinstic languages are a bad fit for configuration.

pas 923 days ago

Is YAML with go template hacks that much better? (Without any kind of type safety in a context where true != "true".)

out-of-ideas 921 days ago

go and jinja "templates" have always been ever so much fun. It depends on the complexity stored in said "configs"; what reads the configs and what writes to the configs.

as a human i want to be able to both read and write, easiliy, by knowing wtf is going on for the input(s), and to understand clearly within the config-at-hand, what does what ect (without 5x paragraphs per 1 key-value tuple).

then insert more templates to generate more templates, and then it is a fun spiral

k8s reminds me of those 20min long commercials combined with the energizer bunny, "but wait there's more" .. "and there's more" .. "and there's more" - as with orcrastrating infrastructure as microservices, there's always more and more complexity to add to the monster. More layers, secutity, networking, ect, ect..

pas 920 days ago

well, there's more, because operating software is exactly like that and k8s tries to API-fy that, in my opinion with considerable success.

yes, the actual descriptors are atrociously hideous, but it's okay, it's low-level, evolving pretty quickly, and there are nice high-level representations -- https://cdk8s.io/docs/latest/plus/

so, yes, of course, compared to FTP copying PHP files into cgi-bin k8s is more complex, but the feature set is also different.

of course, not everyone needs declarative gitops-based blue-green deployment with pristine dev/demo/staging/UAT envs on each new PR. and usually when people think they do it's mostly just FAANG envy :)

but, speaking from experience, setting up a k3s cluster is easy, cheap, and deploying things on it with a "kubectl apply" is also easy. setting up CronJobs to do backups to some S3-compatible thing is also quite doable, and so on. and you end up with a big bag of YAML. is it better than snapshotting a VM? who knows!

lambda_garden 919 days ago

No, that would also be a poor choice.

ahoka 924 days ago

Does not beat the “BIND zone files generated by m4 macros from a shell script”, I had the honor having to modify. Comes close second, though.

cheekibreeki2 924 days ago

No raw sendmail.cf? :)

doubled112 924 days ago

Thanks, that little twitch under my eye just started up.

I worked at an org where an admin went a little rogue and updated the config without using the macro file. Another admin didn't realize. Well, didn't realize until it was too late.

rwmj 923 days ago

autoconf (configure.ac) files are also m4 hell, with some whitespace surprises.

phendrenad2 924 days ago

It's not really astonishing. People don't usually have knowledge beyond a quick google search. If you google XML, you'll see a lot of negative sentiment about XML specifically related to mid-00s Java frameworks and some people chunk this fact as "XML old and bad". If you google YAML, you'll see that lots of people like using it for relatively simple things (Rails config files...) and some people will chunk that fact as "YAML good".

mieubrisse 923 days ago

For me, one of the true advancements of JSON/YAML over XML is how its features are orthogonal to each other I don't have to ask "should this be a tag property, or a subtag"?

Which, to your point, isn't to say that YAML is "good", but I think there was at least an advancement through minimalism.

phendrenad2 922 days ago

Right tool for the job. I think XML is too much tool for most jobs, but there's a huge gap between JSON/YAML and XML, so in 99% of cases I'd rather err on the "too much" side.

lemper 923 days ago

defo agree, mate. if data that is consumed by computer is really crucial, I'd better be sure to use any format that has clear AST, schema, or whatever ya call it. now I'd rather use xml / clojure's data sexpression / even json over yaml. it's just a gut feeling, in the most part.

pas 923 days ago

YAML and other human-friendly presentation/editor formats are cool, but they need to come with a schema/validation/types. k8s APIs have a bit of validation, but there are frustrating gaps.

smsm42 924 days ago

That has been one of the most annoying aspects of dealing with helm templates. It's like that USB plug you never can insert the right side up, only it has like a half-dozen sides and to cycle through them all to figure out the right one.

jscheel 924 days ago

1000 times yes. It is horrible to deal with. And the error reporting when it fails is often obtuse in my experience.

ljm 924 days ago

I don't think I'll ever understand why YAML was chosen as the language of choice for all things devops, and why so many startups have sought to augment yaml with syntactical hacks

I mean, as soon as you template it and need to do `{{ something | indent 4 }}` or some shit to make the template work you know you're on a bad track.

lukeschlather 924 days ago

I don't really think YAML is the problem, it's the string-based templating. I'd like to see Emrichen or something like it become more common. And Emrichen is format-agnostic, you can write your stuff in json or YAML and it looks and works pretty similarly. (Although I stick to yaml.)

https://github.com/con2/emrichen

If JSON had comments I might lean toward json.

kevincox 923 days ago

100%. Target unaware templating is always a mistake. It is unfortunately common in our industry because it is easy and works most of the time. But this is also why SQL injections and XSS are one of the most common vulnerability. SQL injections are getting better because people are more often using parameterized queries which never need to actually encode the values into the "template" and XSS is getting better because most big frameworks now have target-aware templating that properly serializes values. But these are both hugely common issues. Look up a tutorial about how to use SQL or make your first website and more likely than not you will see examples that have vulnerabilities. Our whole industry is teaching the wrong thing by default, then hoping to fix it later.

String concatenation may have been a mistake. The concept of a string may have been a mistake. Every sequence of bytes has some structure, and in order to mash two "strings" together they need to be serialized in the correct way. Even when building error messages it would be nice if you can reliably identify the "chrome" from the "content".

Also look at terminal escape sequences. When we print text to a terminal we should probably be replacing non-printable characters with some sort of encoding so that the reader can understand that 1. these are from the content not the application and 2. Not do stuff like delete the output line to trick the reader.

Every time you put two strings together you should think about how you need to properly encode one into the other. Output unaware text templating almost always fails this because it relies on the user to do this for every single interpolation, and that is doomed to fail.

I wrote an even longer rant about this on my blog a while ago: https://kevincox.ca/2022/02/08/escape-everything/

jauntywundrkind 924 days ago

JSON5, JSONC, and others have comments.

Also, there's the old dirt hack

  {"item": "this is a comment",
  "item": "//so is this but more obvious",
  "item": "because in 99.99% of implementations, last value wins:"
  "item": 42}

bvrmn 924 days ago

It's kinda important to keep comments on automated transformations.

tbrownaw 924 days ago

It's "declarative" which is supposed to be good, and you can represent complex data structures without any annoying closing braces (like json) or tags (xml).

Plus, well, there's a disturbingly high number of people who think that semantic whitespace is a positive.

yeetcode 924 days ago

When did the universe decide braces were so bad? Makes formatting so much easier.

tbrownaw 924 days ago

Encoding the same information in both indentation and braces is redundant, and so is a violation of DRY, and so is bad.

(Of course the reality is that braces are how you write scoping information and indentation is how you read it, and CQRS is actually a good thing.)

lucasyvas 924 days ago

This is ridiculous. Indentation is a visual concern for humans and braces are a semantic concern for a machine. Code only needs the latter to actually execute.

You can prefer them be coupled for taste reasons but they are fundamentally different. Code minification is a practical example where you can save many characters by omitting whitespace.

This is like tabs vs spaces. The white spaces crowd prefer the trade offs, and the tab crowd don't. But the white space crowd can't argue the perks of tabs - they exist.

disclaimer: Am braces/whitespace camp.

happymellon 924 days ago

The tabs/spaces war was fought because of IDE choices that restricted your ability for sane tab rendering defaults, and simple overriding of tab size.

Now that terrible text editors won, I have to watch a 2 Vs 4 space war.

If only we could have a character that could represent indentation and people could set the rendering so they can visualise it in their own preferred way.

And folks who want to remove braces should be pointed to the Apple certificate snafu.

the_gipsy 923 days ago

But you see that indentation is always written in the source plaintext, together with the braces, right?

wizerdrobe 924 days ago

I’m always amazed at coworkers with a 100 WPM typing speed and IntelliSense griping about how hard it is to type.

jauntywundrkind 924 days ago

Inbound links from HN blocked I think, but Tom Macwright's proclamation that typing is not the problem has long stuck with me. https://macwright.com/2015/01/19/typing-is-not-the-problem

These optimizations for some perceived ergonomic win almost always make terrible tradeoffs versus using a good well established data format. And especially systems which favor human consumption but create extreme difficulties for machine handling, those are the worst!

Yaml being such a non-Context Free Grammar is a huge pain. There's so much state in the parser. It only gets worse from there. Yaml has all kinds of wild crazy capabilities. References, a variety of inline content blocks, and weird ways to invoke stuff?? GitHub yesterday did a code review of Frigate, an enormously popular surveillance video analysis tool that's heavily downloaded, and found, oh yes, a huge glaring yaml bug allowing remote execution, because executing arbitrary code is just built right in to yaml amid 3000 other crazy hacks & who would have known to go look for & disable that capability?! https://news.ycombinator.com/item?id=38630295 https://github.blog/2023-12-13-securing-our-home-labs-frigat...

Typing is not the problem (even though I see so many people just terrible beyond words at navigating project structures or the command line... Improve! Some day!).

mieubrisse 923 days ago

I do think there's a power to the readability that makes it more approachable (but which eventually burns you). We were rewriting our AST at Kurtosis last year, and the default choice we were going to go with was of course YAML. But we came across a Github issue from DroneCI (who also started with YAML) that said something like, "we started with YAML, and we learned that you always eventually want to add more complex lotic on top. Go with a Turing-complete language to begin with, else you'll be in the CircleCI trap inventing a language via YAML DSL."

We decided to go with Starlark as the base language for our DSL, and we've had a consistently great experience. Users report that it's very approachable, and the starlark-go library is very pleaeant to deal with.

kevincox 923 days ago

I totally agree that string-templating into a data serialization format is a mistake. But you can make life dramatically easier on yourself by doing `{{ something | toJson }}`. In fact write a linter that every single substitution is followed by `| toJson` and you will save yourself a lot of headaches.

The main issue is that it make it more difficult to mix hardcoded and inserted values.

    labels:
        - mylabel
        - {{ extraLabels | indent 4 }} # toJson doesn't work here.

Also the small technical concern that YAML isn't actually a superset of JSON. (But you are far less likely to hit these cases than other escaping bugs).

mdaniel 923 days ago

As far as I know the way helm thinks about that problem is putting any such literals into a temp copy _then_ serializing https://masterminds.github.io/sprig/lists.html#append-mustap...

  labels: {{ append .extraLabels "myLabel" "myOtherLabel" | toJson }}

your commentary talks about json but your code snippet is in yaml, so it's possible one or both of us are solving the wrong problem

kevincox 923 days ago

YAML is (almost) a superset of JSON. So it is much easier to serialize data into JSON than avoiding worrying about indentation.

mdaniel 923 days ago

I think you are lobbying the wrong person about the relationship between yaml and json; I was trying to point out that you had

  thing:
  - item1
  - {{ foo | toJson }} # <-- is not going to do what you expect

unless you quite literally wanted

  thing:
  - alpha
  - - alpha1
    - alpha2

kevincox 923 days ago

Ah, I did have a typo. But you also copied it wrong. I wrote this

    labels:
        - mylabel
        - {{ extraLabels | indent 4 }} # toJson doesn't work here.

Which has an extra `-` and as you pointed out would produce a nested list.

But I meant this which would merge the lists.

    labels:
        - mylabel
        {{ extraLabels | indent 4 }} # toJson doesn't work here.

nucleardog 924 days ago

The other popular option would be JSON.

No comments and having to escape.... well, practically everything you'd commonly be entering makes JSON suck at this sort of task.

taspeotis 924 days ago

Yelling At My Laptop

Too 924 days ago

Yaml has a lot of problems, this is not one of them.

This more shows that one shouldn’t be using string templating to create data structures in the first place.

bbkane 924 days ago

What would you recommend instead? Personally, I think Typescript makes a great config language

jpgvm 924 days ago

Dedicated config languages are the best usually. Jsonnet/Cue/friends.

Failing that if you actually need/want a procedural/non-pure language then I think Kotlin or Ruby take the cake. Both have extremely strong support for DSLs which IMO is key to reaching a modicum of usability.

sakjur 924 days ago

Starlark is nice as well, it’s syntactically based on Python, and behaves a lot like regular procedural languages, but it’s meant to provide a pure and safe environment for configuration and be embeddable.

mieubrisse 923 days ago

Big +1. We switched to Starlark for our DSL last year and have been very pleased. Users who've never used Starlark before come in with some 'what, another language?' trepidation, and end up pleasantly surprised.

jpgvm 923 days ago

Since using Bazel a bit I have grown an appreciation for Starlark also. The big thing is that list and dict comprehensions are a really nice fit for these types of tasks.

10000truths 924 days ago

Lua is fantastic for the config-as-code use case. Easy to read and write, with a lightweight embeddable interpreter, and it has almost universal library support across different languages/environments.

ishigoemon 924 days ago

It's a shame that helm v3 didn't move forward with the lua engine[0]. I don't imagine ~=/1-based arrays were a worse timeline... And here we are 5 years later.

[0] https://github.com/helm/helm/issues/5084

ljm 923 days ago

I dunno - using cdk8s at one place was a miserable experience. Adding runtime evaluation just made everything more complex than it should be.

In that case I preferred to describe infrastructure rather than program it.

maximus-decimus 923 days ago

ohhh, that's why there are "indent 4". it just never clicked for me I guess. I thought it was just some dark magic incantations.

smsm42 924 days ago

YAML itself is not too bad, especially with a good IDE. But YAML and its ws-sensitivity combined with Go templating is horrible. Every component by itself looks kinda reasonable, but when they come together it makes an unholy mess.

fivre 924 days ago

the best part is that Helm does some sort of preprocessing to strip out comments (i think) when templating and validating, so it will report a failure at a line number that doesn't actually correspond to the problem in the original source. tracking down template failures is infuriating because of this

lambda_garden 923 days ago

Is it possible to use Dhall to generate YAML and side-step this issue?

pas 923 days ago

yes, of course, but the problem is that upstream (as in the helm chart ecosystem) doesn't want that, they just want to live in their little bubble.