Hacker News new | ask | show | jobs
by aduitsis 1861 days ago
First of all, congratulations to the team of Ansible and these tools have benefited us in numerous ways that cannot be understated.

Having said that, this caught my eye:

> There may be backwards incompatibilities in the core playbook language. Please see the porting guide for details.

Doing incompatible changes is not something specific to Ansible, for example Puppet has also done it time and again. We've been using both Ansible and Puppet in my previous job, and we always found it mildly annoying that upgrading a system (Linux, FreeBSD) would, in addition to the other "usual" dangers, bring along the danger of the new version not having a package for the "old" version of Puppet or Ansible that we were using. Which would force us to divert attention to the automation tool's problems instead of using the automation tool to solve problems.

I do understand that nothing can stand still and everything must evolve and change, but at some point this acquires the flavour of changes for the sake of changes. Especially when regressions happen and things that were working perfectly are now breaking, it is not exactly pleasant to have to devote time to them.

It's not exactly clear how can Perl programs or shell scripts or Makefiles from 20 years ago play perfectly fine unchanged, but the syntax of a manifest or playbook that does a couple of simple operations cannot remain stable. It's not like those tools were created yesterday, in which case it would be reasonable to expect changes in their first years.

11 comments

The reason why kubernetes et al. are so popular is that the orchestration tools like ansible, puppet, salt, chef... they all failed to deliver ultimately great software. Plagued by mediocre quality external modules and a constant churn of semi-backwards incompatible changes which required frequent maintenance, they have just been hard to use.

You ended up with lots of magic an internal domain knowledge about how to do things, and often had to get clever to get the result you wanted. They were often somewhere both not opinionated and too opinionated, and generally suffered from being "DSL"s domain-specific languages which were really just syntatic magic in various languages which really needed to be actual programming languages or more explicitly policy setting frameworks. They were in the middle of several extremes and tried to be lots of things at once and generally just didn't do a great job, unfortunately.

> You ended up with lots of magic an internal domain knowledge about how to do things, and often had to get clever to get the result you wanted.

How does Kubernetes not also result in the same? I'm not questioning plenty of improvements and the somewhat different domain it brings to the table overall but genuinely curious. From what I've seen any tool designed to integrate and manage complex, disparate systems is going to end up with lots special cases, domain knowledge, and require digging into implementation details on a regular basis for anything other than absolute common-case uses.

> How does Kubernetes not also result in the same?

Here's the difference between kubernetes and ansible (and alikes) (and I'm over-simplifying a bit of course, but the idea is there)

* With kubernetes, you write down the state you want your infrastructure to be in, and k8s figures out how to get there

* With ansible, you write down how to get to the state you want and ansible runs those steps, and hopefully they end up in the same state. The issue is that if you run the same set of steps on a Debian linux distribution, and on a Ubuntu distribution, well, they might end up in a similar state, but they actually aren't; for starters: one is debian, the other is unbuntu.

Ansible does try to do the 'write down the state you want your infrastructure to be in' thing. Eg,

    - name: "Ensure sshd is started and enabled on boot"
      systemd:
          state: started
          enabled: yes
          name: sshd
You're not saying 'enable the sshd service', but 'the desired state for sshd is started and enabled'.

But this is not perfect, because it's easy to eventually end up with imperative commands, like "run this script", or "Copy this file".

That does not seem avoidable. I have never used kubernetes, but this seems highly suspect:

>With kubernetes, you write down the state you want your infrastructure to be in, and k8s figures out how to get there

My guess is that this devolves to "run this script" or "copy this file" in practice as well. Or it just becomes tautological, i.e. "here's the state I want the system to be in (and that means run this script and then copy this file over here, and don't forget to symlink it over there)". Am I wrong?

Kubernetes itself tries Very Hard to completely abstract those implementation details. Like, obsessively hard. You (assuming "you" are somebody who wants to run something in k8s) should never need to know or care what it's doing "under the hood" to run your pod in the way you say.

What's happened though is that Kubernetes expects for you to hand it an artifact that encapsulates all of the messy steps required before it can "run the thing." In practice that means the "run the script and symlink stuff" kind of glue is still present, it's just moved into Dockerfiles (which are, more or less, just fancy shell scripts themselves), build scripts, CI/CD pipelines, and startup scripts.

The big shift is more that the responsibility for managing this junk is pushed, organizationally, onto whoever knows how to operate the actual application (because they'll be creating the Dockerfile). Usually that's a developer.

So it's not wrong to say:

> With kubernetes, you write down the state you want your infrastructure to be in, and k8s figures out how to get there

But it's not really a complete explanation, because it assumes you've already gotten to the point that you've bundled an artifact up that will "do the right thing" when Kubernetes provides the fundamental pieces of infrastructure you define in your template.

Yeah you're (mostly) incorrect. I recommend looking into terraform and how it handles state, that will give you a better idea of how it's possible to do infrastructure in a "this is what i want, figure out how to get there" way.

Terraform isn't the best implementation of it; k8s has full control over the infra so it's even more powerful in that regard.

Kubernetes doesn't attempt to be an API that can solve any orchestration issue, it is an abstraction on the underlying architecture. It's a lot easier to be backwards compatible here (and if you are not, create an operator that is).

Now whether its the right abstraction is a matter of debate. But ultimately like VMs or containers, its so far proven to be a valuable tool, at least for medium sized companies.

and then you use something like helm to orchestrate and end up being in the same place where you left ansible:

lots of environment specific magic and flow-breaking with every upgrade of anything

same with cloud envs and terraform

One of the best things about K8s is its pretty stable API. We use GKE and the upgrades happen without issues 99% of the time.

The other benefit that k8s got for "free" was with containers: developers could stuff in whatever environment, software and configs they wanted inside the container. This removed most of the "my app broke because some shared lib didn't install properly" sorts of issues. This separation b/w software configuration and application configuration is extremely powerful.

Yea, I think with both ansible (and stuff like puppet) and kubernetes they are both powerful tools that are pretty leaky abstractions. They are great for sharing and implementing recipes, but that abstraction layer makes it a little more difficult to become a master chef and understand/control exactly what's happening. They are awesome to have in your toolbox, but I still find it incredibly useful to have a tool like python in there as well.
I’m not saying kube really fixes all of these problems or is a better solution, but it exists as a response to a crappy solution. in other words one problematic solution replaced another problematic solution. Stay in the field long enough and you see a pendulum where one kind of solution with problems is replaced by a different kind of solution with different problems... and then it swings back and a new general goes back to a new iteration of the first kind of solution and gets to rediscover the old problems again, back and forth over and over.
Who said it doesn't? Look at its schema it's horrific.

The reason it's popular is that it's backed by Google and solves problems Docker the company couldn't.

That doesn't mean k8s is by any means not suffering the same problems of being a dsl that should be something else.

It also doesn't mean k8s won't suffer the same weirdness between platforms (AWS vs GCE vs GKE).

> The reason why kubernetes et al. are so popular is that the orchestration tools like ansible, puppet, salt, chef... they all failed to deliver ultimately great software. Plagued by mediocre quality external modules and a constant churn of semi-backwards incompatible changes which required frequent maintenance, they have just been hard to use.

This is a good point, and most problems came because the tools were still only involved in mutating state of the machine. They eventually evolved to be declarative, but that's just cosmetic.

In the CM tools if you added declaration that package should be installed and you removed it the package will still be there. You had an entry to say that it should be uninstalled.

This basically ensures that machines that are configured the same way often ends up being drifted apart.

NixOS solves this problem by having language that instead of describing what should be updated instead has a declarative language that describes how the entire system should be built. When you change configuration it actually rebuilds the OS from scratch. It might seem like a lengthy process, but it is actually a quick because of caching. Nix just fetches missing pieces from repo and places in ints store, rebuilds things that are not in the binary cache and then updates symlinks to new locations. Because of using symlinks, upgrades are atomic, and you can also roll back your changes.

The catch? It is a paradigm shift, it also doesn't help that the language used is functional. So it's a steep learning curve.

Nix is an incredibly far-sighted vision, and it's great to work with... once you get your head around it all. I much prefer to build containers using it, and it does a much better job of "provisioning" than any other tool I've used. I remember being appalled when I first used Ansible and discovering that adding a package, then removing the stanza to add that package, resulted in the package persisting on the system.
I've found that every new technology that solves problems, also comes with its own set of problems. :)
Ansible, chef etc. are provisioning tools, not container orchestrators.
The point is that had they worked well, containers may not have become as popular as they are now.

Containers are only somewhat used for scalable microservices. They're mostly just an easier way to deploy software than creating your own RPM package and using orchestration tools to deploy it on VMs.

I hope that you do understand the difference of server provisioning and container orchestration, as mentioned above, solve completely different problems.
Those things solve completely different problems.
> The reason why kubernetes et al. are so popular is that the orchestration tools like ansible, puppet, salt, chef... they all failed to deliver ultimately great software.

This assertion makes absolutely no sense at all, and reads like a Markov chain of buzzwords.

Kubernetes is containers autoscaling, blue-green deployments, revertible deployment histories.

Ansible and the like is configuration as code. That's it.

The closest there is between kubernetes and configuration-as-code tools is Docker's dockerfile.

I think it still makes some sense. For example, if you look at the process of software development like this:

  [business requirements] -> ... -> [deployed and working software]
Then it stands to reason that Ansible and Kubernetes are just two ways of getting to the same end result.

Where they differ, is the approaches that they take to achieving that result:

  Ansible: [run a playbook ahead of time] -> [the environment for your app will be set up] -> [you can deploy to it through Jenkins/GitLab/whatever] -> [your app will run, but Ansible will know almost nothing about it unless run]
  Kubernetes: [create a cluster ahead of time] -> [you'll be able to run any containers on it] -> [you can deploy to a registry through Jenkins/GitLab/whatever] -> [Kubernetes will make sure that your app remains running]
So essentially, i believe that many people have the stance, that they can probably just set up an environment for an application and forget about it (or at least not constantly monitor it), which is more in line with what Ansible provides, whereas Kubernetes is more suited to situations where it's impossible to have "stable software" (i.e. most business applications, since they don't have the quality of projects like the Linux kernel).

If you want fault tolerance, you have to do additional work with Ansible, like writing systemd services for restarts, manually setting up your load balancers and working on ensuring proper failover, as well as service discovery. And i'd reason that if something is hard to do, oftentimes it simply won't be done at all! Yet with Kubernetes, a lot of that comes out of the box (even though i think that it's often too complicated, K3s does seem sane, but personally Docker Swarm or Hashicorp Nomad hit the sweet spot of features vs complexity).

Ansible is way, way, way more popular than Kubernetes in the real world.

(Yeah, k8s does a whole lot more and is more opinionated, but still...)

Exactly : not everyone gets to play with a kube cluster. Also there is a lot of sysadmins that don't trust docker-in-prod...
It sounds like you think kubernetes et al. are a suitable replacement for ansible et al. , however k8s are not able to provision settings on a Web Application Firewall from F5 and the iLO links on your HP blade server farm. Et Al. Your statement works if you are letting AWS/MS/Google do all that for you and all you are responsible for are package dependencies and joining the right VLANs. Not everyone is so lucky and actual Configuration Management tools like Ansible+, while not perfect, make Domain Specific (i.e. real world Multi-Vendor) large environments run on time for most of the time.
I actually don’t like kibernetes.

And of course you’re right, the use cases don’t totally overlap. However there is a big chunk of the core functionality which is absolutely competing ways to solve the same set of problems.

> It's not exactly clear how can Perl programs or shell scripts or Makefiles from 20 years ago play perfectly fine unchanged, but the syntax of a manifest or playbook that does a couple of simple operations cannot remain stable.

It's clear when you deal with both types of systems. Perl optimized for stability and backwards compatibility at the cost of new features and new language conveniences. This is great and a strength when all your competitors are playing catch-up or don't have enough market share to make an impact, but once that's not the case, anything new they bring to the table is a point in their favor, and you get what we have now, the long slow death of Perl.

Backwards compatibility is wonderful until there's a really useful feature that it prevents, then all of a sudden it's a more complex story because it's not just preventing busywork, it's preventing progress.

There seems to have been a strong swing in the past decade or so towards busywork being not just acceptable but in some way laudable. "Doing the work" to keep your product using all latest versions is noble, whether or not you actually gain anything by upgrading. I can't shake the feeling that it all ties in to the industry wide push for SaaS and subscription models and evergreen software, all of which share an underlying motivation of everlasting job security for software companies and the people who work for them.

It's a similar attitude to the parts of the OSS community who feel that anyone complaining about poor documentation or usability just "doesn't want to do the work" to understand the system.

This is happening in Django land right now.
It's fun how one's own perpectives and opinions change and adapt with the experience of facing real-world problems over years of professional (or otherwise any kind of) experience.

I was contrary to the idea of 3rd party software repositories, and was of the opinion that one should strive to depend as much as possible from system provided packages.

But the same experience that you mention now, is what gave my point of view a 180 degree change, and now I'd rather install something from pip or NPM. I can just pin the desired version, and they more or less have the policy of holding past versions in a frozen state, so if I know for a fact that version 1.2.3 is exactly right for my needs, I'll keep using that one regardless of system.

The weak point of packaging systems (my experience is Debian and Ubuntu) is that normally there are not many (if _any_ at all) alternative versions offered for the same software. They are built with the general assumption that you should use the latest ones, and of course they don't even attempt to offer a history of versions of every package. So every time you upgrade, it happens what you say, "divert attention to the tool's problems instead of using the tool to solve problems" (very good wording by the way, I liked it a lot)

With verison 4.0.0, they've implemented semantic versioning which should make life easier for avoiding breaking backwards compatibility changes.

https://semver.org

All semver does is tell you you are pinned to some obsolete tool, and you are going to go through a soul sucking 6 months of yak shaving to stay up to date. (rails programmer since v3 here)

Semver doesn't make life easier. Making backward compatible changes is what makes life easier.

Does that make it easier or does it just mean you'll be more aware upfront that to get the security fixed or support you need for some new feature you need to upgrade to an incompatible version
if it was just incompatible syntax changes, that would be one thing. but it's not - it's incompatible administration and structure changes. they just stabilized the new modules shift a couple months ago, which restructured all their documentation and issue tracking, breaking uncountable links and closing thousands of open issues. and that wasn't the first time they'd done that.

at this point i've started transitioning as much of my administration stuff as i can away from ansible. there's some good stuff in their codebase, but it's easier for me to find it in the source code and copy-paste it into my own projects than it is for me to work within the ansible structure. ansible solved a lot of problems for us, but over the time we've been using it i'm pretty sure it's caused more problems than it's solved.

> at this point i've started transitioning as much of my administration stuff as i can away from ansible

Any preferred alternatives you can share?

There's pyinfra (https://github.com/Fizzadar/pyinfra) if you like Python (disclaimer: I started the project), heavily inspired by Ansible just without the yaml/jinja2 hybrid stuff.
I've been following pyinfra precisely because every new Ansible release makes my heart sink knowing how much work it's going to cause me and how inscrutable debugging our playbooks is every time. I've been trying to find the time to port over all our existing Ansible playbooks but have really only scratched the surface so far. However, I have high hopes, pyinfra looks like a well thought through project.
Join the club of staying on Ansible 2.9 forever! It’s 2015 MacBook level good and still being developed. 2.9.21 was released two weeks ago and 2.9.22 is coming soon.

The way Ansible is developed it’s way easier to fork your own and backport bug fixes and features you need because it’s a toss up whether the fix you need actually got merged or whether it languishes in the GH issues.

i haven't even explored the options for shifting to another ansible-like system, i've just gone back to the supposed "bad old days" of a collection of random tools with shell scripts to fill in the gaps.

a combination of docker containers and terraform (for managing aws) is doing most of the heavy lifting for me.

I wrote my own remote scriptrunner thing because I hate Ansible so much.
I did this before with perl’s SSH::Batch. You can install it from CPAN and get really simple ability to run code on groups of servers in parallel. Works great.

Eventually I switched to Ansible because it made the more complicated things a lot simpler.

I use bash scripts too on almost all my systems. Containerize deployments. Dockerfiles are a godsend and work indefinitely.
I am not much familiar with Ansible/Puppet etc. However working in Java and dealing with frameworks like Spring/boot, kafka etc I notice a phenomenon of large number of breaking changes, specially when one update in couple of years as opposed to update with every release. Now when I dig into changes it just points to extremely poor design to begin with. It just couldn't handle slight change in requirement without breaking changes.

I do not think it is a case of hindsight 20/20. These are basic things like object need multiple fields but it is rigidly defined with 2 fixed strings and now one need object V2 to update. Or all functionality dumped at base URL so no scope of adding sub-functionality without breaking the whole pattern. I can understand no one can plan for how complex functionality evolve over time. But here I see perennial amateur hour with every release. And it is doubly irritating when these frameworks are already many layer deep stack of bullshit abstractions so ideally they should be able to handle changes without breaking.

I tend to think this is an issue with 'designing' systems vs 'evolving' them. When you build something to 'scratch your own itch' and it suddenly becomes popular, lots of stuff you didn't think of tends to pop up. Maybe all the market studies, focus groups, ad nauseum in 'traditional' BigCo development actually did have some value.

Ansible at least had the advantage of being a 'broad' tool from the get go. So a lot of the common use cases were covered. But I'm sure surprises still popped up :-)

Can you give some examples of breaking changes in spring boot? Thinking of using it for some projects
How is this possible in 2021:

"Due to a limitation in pip, if you are upgrading from Ansible 3 (or earlier), you need to uninstall Ansible and Ansible Base before installing Ansible 4"

https://xkcd.com/1987/

Python does some things really well, but their package managers leave what to be desired [especially for the uninitiated].

I've never used things like Ansible or Puppet in a professional environment, but I can appreciate how difficult my recommendation is. If you're using a technology for a significant part of your business, decouple it from the OS. Where that line is drawn is subjective.

I don't even think backwards incompatibilities are the biggest reason. In practice, the problem is often around bugfixes and upgrades. Upgrades are much easier if you can do them independently. It sucks to have to upgrade the kernel and other tooling just to grab a small bugfix--or hold off on a transition waiting until both OS issues and code base issues are resolved.

> It's not exactly clear how can Perl programs or shell scripts or Makefiles from 20 years ago play perfectly fine unchanged

In all likelihood they don't. For open source projects they probably have a lot of workarounds and cruft to support a variety of environments and spans of versions they are both written against and tested against. Internal code is written and tested against a specific version relying on your proprietary environment.

what do you mean decouple it from the OS ?

this is not coupled to OS to begin with,ansible is just a bunch of python modules in a long trenchcoat

> we always found it mildly annoying that upgrading a system (Linux, FreeBSD) would [...] bring along the danger of the new version not having a package for the "old" version of Puppet or Ansible that we were using.

If this is the only issue, it sounds only mildly annoying. Either upgrade your tooling to use the newer version of be prepared to package up that old version of Puppet/Ansible--as long as you control what version is installed and are comfortable with the range of versions the package manager offers.

The type of scenario I had in mind is relying on something like Python, Ruby, Perl, etc. Something the OS may also deeply rely on. This came up a lot when macOS stopped updating a lot of GPL2 stuff to avoid shipping GPL3 or when they announced they'd stop shipping Python, Perl, and Ruby. You also see it when people try and change system Python version outside of the package manager in Linux and can no longer boot--the same thing can just as easily happen with libraries.

I feel you: in our environment, we stayed on Puppet 3 for exactly the sort of reason you call out for the longest time, and upgrading our Puppet codebase to be 4+ compatible was a major effort.

OTOH, in the case of Ansible, I've had good success with `pip install --user` to get a local (to my user) Ansible install that's separate from the system, and it looks like specific versions of software can be requested.

Yes, that's another reason I am sticking with Bash for teaching for my book on deployments. Your Bash will only change if your tools change. If you want to know how things work under the hood, you don't want to go through a rabbit hole of Python modules that also change beneath you.
One would think at the very least you should be able to specify a version of your script some place and it be able to parse all prior versions.