Hacker News new | ask | show | jobs
by chologrande 1099 days ago
After reading this post, I've browsed the site. I'm not sure how this is anything but significantly worse than the current model?

I've been around long enough to know that any "no code" style interface or GUI are typically the _problem_ not the solution. Regardless of the code they export, you end up with fat fingers, misclicks, forgotten UI paths to follow... Taking a software eng approach to shipping infra is a stable, known process that the infra team and the software teams can understand, no specialized GUI tool knowledge required.

I've been using the same basic terraform modules, jenkins pipelines, and infra architecture for nearly 7 years across multiple companies and numerous cloud deployments. It's not fancy but it justworks.jpg. Every time I re-use that code for a new deployment or account I save TONS of time.

Devops doesn't have to be hard. Infrastructure doesn't have to be complex. Deploying every day isn't _that_ difficult. KISS Method is key, especially when you're looking for speed. Using _less_ tools from the CNCF is better, and will let you move faster, not adding a new one.

6 comments

> Devops doesn't have to be hard. Infrastructure doesn't have to be complex

That's simply not true for anything larger than a few services and a small dev team. The cloud is very complex to do right when you focus on security, performance, and scalability. And Terraform invariably devolves into a nightmare when you have a ton of resources with dependencies between them.

I'm definitely not google scale, but we're global, in over 300 cities spanning ~30 countries. On an avg day we process well over 25k rps on multiple services. Simple architecture and IaC like terraform is exactly how we manage the dependencies. It's the solution, not the problem.
You think you have simple architecture when you’ve introduced Terraform to what is, based on your statistics, a two server use case. A PlayStation is capable of 25 kRPS and probably its data iops, too. Buy another one and you’re HA.

You’re trapped in the complexity of the method and think you’ve achieved nirvana. This comment reminds me of those demos when Hadoop was the rage, where people would do a $4 million Hadoop ETL on their laptop and shut up a room.

You're assuming you know the use case. We do more than serve LAMP stack. It takes more than a few playstations I can assure you.
No, I don’t need to know the use case. There are a vanishingly small number of use cases that cannot be performed 25,000 times per second on hardware from 25 years ago. If you indeed work on one of those few cases, you wouldn’t simultaneously call Terraform and public utility cloud simple architecture for any use case relevant to Hacker News discussion; that’s just plainly false beyond a certain level of computing depth, i.e., after you’ve written a process scheduler in an operating system or a supercomputer. (Note that I’m not calling you inexperienced. I’m talking about exposure to diverse types of computing, or, more realistically, the papers those communities develop.)

Those two ideas, that 25k is hard and Terraform is easy, are incongruous positions to hold from my perspective and basically prove the point I made. I understand if that’s not as obvious to you. The Web and cloud trap people into believing the world you’re living in is computing, and that the computers you’re working with go a certain speed on the road. There’s a lot of infrastructure in between you and computing in the model you’re working in, and it’s not apparent to you as unnecessary to compute. Computers are capable of far, far, far more than the entire industry thinks. That’s why those Hadoop takedown demos made me smile back in the day, and why I can’t wait to demo against $10 million of Kubernetes eating companies of the future alive.

Or yeah, blow my mind with your workload that can’t be tackled in a few shakes of a PlayStation’s tail with strong vector units nearby (the reason I specifically mentioned a PlayStation).

"I don't need to know anything about the problems you've solved in order to determine that you've solved them incorrectly, and that I'm smarter than everyone, and I'm going to be rich."

Let us know how that works out for you.

I didn't know I could do 25k database requests per second one a PlayStation.
I really don’t understand your argument.
25krps? As in requests per second? I.e one request every 3 seconds?
How do you go from "25,000 requests per second" (25krps) to "one request every 3 seconds"?
Did you confuse “requests per second” (rps) with requests per day?
I did indeed!

Clearly more tired than I realised!

Esp. as you start splitting up your statefiles!

However, I do think that this is mostly essential complexity, rather than accidental one. We're now building systems that are way more secure and/or scalable than before. Least possible network access and permissions everywhere already add a bunch of complexity. Pushing complexity from our code to managed cloud offerings does its part, too. But all of this can be tamed very well with modules and reusable components.

That said, if you're scaling Terraform, I do recommend you to check out the tools that have sprung up in the recent years to manage it. I'll personally recommend Spacelift[0] (see disclaimer). It can help you orchestrate your statefiles once you start having many of them (even tens or hundreds of statefiles in a single workflow are no problem) using stack dependencies, help team members self-serve through blueprints, automate all the things through OPA policies, and generally help you scale your Terraform usage to a larger team.

[0]: https://spacelift.io

Disclaimer: Software Engineering Team Lead at Spacelift, so take the recommendation with a fair grain of salt; I do legitimately think it's a great product though. If you'd like to reach out, feel free to do so through the website or the contact details in my profile.

Well said. Click ops is the root of all evil, IME, unless you're a very small shop.
Click ops is great for most shops, as long as it has advanced configs available and you have at least one expert on staff for when those configs are needed.

At Netflix our goal was always to build tools where the majority of devs just check into source control and click a few buttons, but could go as far as configuring kernel tunables if necessary (but also making that as unnecessary as possible).

But the infrastructure underneath the stuff the devs use was almost certainly not click ops’d. I’m fine clicking a deployment into prod, I am not fine making advanced cloud or build configurations via a UI
Precisely. Even very small companies, should they grow (and we hope they will) should not be doing ClickOps architecture. They will pay for it later when their team is larger and trying to sort head from tail.
Text-based interfaces still have the same issue with being able to make a typo or tab completing without checking. Seems like the major advantage is that these text based tools are able to be versioned well through scripts. This might be fixed with a GUI version of autohotkey, turning these gui interactions into a script.
Mandated peer review, planned actions, and automated risk evaluation are part of our infra pipeline. This typically doesn't exist outside of software dev style pipeline.
I was about to disagree but the automated risk evaluation (ARE) part definitely qualifies your whole statement. Going to go off on a tangent: how do we introduce automated risk evaluation to environments outside of software development. Implementing ARE as software is probably the most efficient method in terms of time and resources. But ideally (some) users of ARE should be able to improve on it. In the case of "traditional" engineering (civil/electrical/chemical etc.) there are engineers who specialize in numerical methods and can improve on ARE. But what about professions where software development skills are not as widespread (or seen as a legitimate contribution to the field). There are still probably going to be members of these professions with software development abilities but is there a point where other methods could be considered (i.e. electrical/mechanical methods) for ARE implementations.
I don't see the current model as a single model, but as several models interrelating.

In software, there are at least three models I can think of immediately: code, configuration and user data.

Why do I separate configuration? Isn't configuration just code or data? I don't think it is. It is data _about_ a particular system, as opposed to a particular user.

Why the distinction here? The code of a system can be designed, developed, and tested against a set of supported configurations. At that point, the system might only run under one configuration at a time, but can be trusted from a requirements perspective to operate under other configurations without needing to go through the whole software development lifecycle again.

Why not just store this in user data, then? Different requirements. Three off the top of my head: configuration data wants much better change management than most user data does. That management wants to be exportable and importable. It wants different access controls.

Historically, configuration data change management has been done in SCM, such as git. The reason why git isn't a big deal in development is because it is not a point of particularly high friction relative to the other parts of the software development lifecycle. It is a _much_ bigger point of friction in configuration changes.

Hence, three models.

We can argue about whether or not configuration changes _ought_ to go through the full cycle, because I am wrong to trust _any_ change to a system with anything less. My practical experience suggests that most of the time, the damage done is less than the cost of enforcing a strict lifecycle on everything.

To make this concrete: terraform for me has been part code, part configuration.

I define a resource, and provide a whole set of knobs on that resource. That's the code part. I test that code against a variety of configurations, the same way I might unit test application code against a variety of app configurations. I also verify that changing knobs from one setting to another behaves. With automated testing, this actually isn't all that hard to do. Once I've verified things work right, I deploy.

At this point, I will default to trusting that things will work. This is the configuration part. Set these knobs to whatever permitted value you want, and the system will update behavior based on those new values. Most of the time, things like this work. That is good enough for me.

You obviously have never worked in Aerospace or Safety critical systems
"Deploying every day isn't _that_ difficult"

Just don't ever ask to roll back...

Why? Keep build artifacts and deploy any build you like. Do you have a state or dependency problem?
It's quite common for one service to depend on another (or multiple others), on the network/firewall state, on configurations that might affect or be affected by other services, etc.

What looks simple when you're the king of your own little kingdom suddenly doesn't seem as simple when that fantasy meets the reality of sharing the world with others.

Same reason clocks shouldn't jump backwards, it breaks so many assumptions you end up with insanity. Do a revert commit so it's the old code in new clothing.
Sure, my bad. I was thinking of the case when the build fails in deployment, not after it is fully deployed. In that case, it should be okay to revert to the last successful build.
Great that your setup work, i personally hate terraform and try to avoid it.

k8s is also KISS but it brings even more 'out of the box' like logging and monitoring, would highly recommend you to take a look perhaps you like it.

Terraforms state management is bad and a lot of people don't get that you store secrets in them. Bootstrapping this securly already needs infrastructure like remote stores.

Jenkins is fine i would say but with argocd you actually gain real insight. Argocd is also IaC and you can manage argocd through argocd.

The adoption of argocd in the platforms i have build, is great. Developer teams love it, get used to it very fast and don't need cluster access/ (in your case vm access).

With k8s you also get zero downtime deployment, blue / green basically for free.