| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dwill-mdcloud 479 days ago

> I wouldn't say Ops wants control, it's we want to stop being paged after hours because Devs yolo stuff into production without a care in the world. Not sure tooling will fix that.

You have hit the nail on the head here. Our base hypothesis is the only way to solve this problem is to start with a self-service approach. If I deploy an RDS instance and nobody ever connects to it, it will never have an issue. The moment a Dev starts firing N+1's at it, I have to get up at 1am. Developers need to have ownership and accountability for their infrastructure without having to become absolute cloud experts.

Our goal is to enable Ops teams to build catalogs of solid building blocks that developers can build into novel architectures and safely own and operate. The collaboration between Ops and Dev is delegated to software and eases this friction.

> It looks like a functional platform and another "Cloud defaults are too scary? Here is a sane default option."

I would push back on this notion. An Ops team builds reusable modules that match their reliability and compliance requirements. You _can_ use modules we have created but we expect that you own your IaC modules. They will conform and evolve with your organization's best practices.

The DevOps is bullshit article is the inspiration for making a platform that manages the relationship between Dev and Ops which I think separates us from our competitors in the space.

2 comments

otterley 478 days ago

> Our goal is to enable Ops teams to build catalogs of solid building blocks that developers can build into novel architectures and safely own and operate. The collaboration between Ops and Dev is delegated to software and eases this friction.

What you’re describing is the Holy Grail of being able to simplify a complex system without sacrificing functionality, security, performance, cost optimization, observability, etc. This dream has been around a very long time. People keep attempting to solve the problem (because, well, there is potentially money in it) and they’ve all failed in the long run (although some vendors got wealthy until the customers saw the light). cfengine yielded to Chef/Puppet/Ansible; Terraform is a mess; VMWare is the walking dead; and K8S will probably see a similar fate once people burn out on it.

The problem is the age-old one of leaky abstractions and jammed-up controls. Sure, an abstracted database component might be adequate for a simple workload, but as soon as the component reaches the limits of the abstraction, you’re stuck. You can’t get your hands dirty and solve the problem. And if you can, you might as well throw the abstraction out and configure the component natively.

On top of that, you cannot effectively abstract the observability of complex workload components. These are often highly specialized pieces of software for which the metrics are not only different, but the values of which have different meanings.

Anyway, if you figure out how to crack this nut, I applaud you. But it’s been tried before, many ways. I’m not optimistic.

link

dwill-mdcloud 478 days ago

What can I say, we are dreamers who have suffered as both Ops and Dev.

I think the way we approach abstraction is fairly different from other Ops tools on the market. We are product engineers first. We are trying to pull in a lot of what makes product engineering work to the ops side. Our guidance on abstraction is very usecase driven. Instead of having an S3 module, you have a public website bucket, a landing zone bucket, and a CDN bucket. This prevents the module from outgrowing its usefulness.

If it is possible to crack this nut, we will need people like you to give us the guidance to do it. We should jump on a call, no sales. I can walk you through the platform and you can tell us why we are crazy for trying!

link

sgarland 479 days ago

> Developers need to have ownership and accountability for their infrastructure without having to become absolute cloud experts.

This will never happen. You can’t own something you don’t understand.

Ops and Dev are different roles for a reason, and the only reason we’ve shifted away from that is to accelerate profits; yes, you can spend your way to growth, and yes, you can run massively complex systems on hardware you have never seen, nor understand. That doesn’t make it a good idea.

link

dwill-mdcloud 479 days ago

I am unsure what you are getting at. Our platform is all about Ops and Dev being separate jobs, but operating at a greater scale. Ops produce modules that can be consumed by Developers. Developers control the scale of their infra while Ops encodes security and compliance in the module. It seems like this would be the ideal model in this instance. If it is not, what does the ideal look like in your opinion?

link

sgarland 479 days ago

Just as choosing the correct algorithm, the best library, or the optimal design pattern for a given task is complex and important, so is choosing, configuring, and managing infra.

The hyperscalers have convinced people that you don’t need to know how to run a database, you can just use RDS et al. You don’t need to know how to manage K8s, you can just use EKS. This is dangerously untrue, because those tools are good enough that most people can get them going and they’ll work reasonably well, right up until they don’t (especially RDBMS). Then you hit edge cases that require you to have a solid understanding of their architecture, as well as Linux administration – for example, understanding how Postgres’ bgwriter operates, and how it is affected by disk IOPS and latency, not to mention various kernel-level tunings. None of this matters in the slightest with small DBs, say, < 100 GiB. It may not even matter at larger scales depending on your query patterns.

The various DB offerings (I’m going heavily on the DB example because that’s my current job) like Neon and Planetscale mostly have the right idea, IMO – stop assuming devs know or want to do ops work. They want an endpoint, and they want things to be performant, so have automatic schema and index reviews. Critically, in this model, devs are not responsible for nor accountable for the infra’s performance (more or less; discussions on schema design and its impact on DB performance aside). In other words, this has separated ops and dev.

I say they’ve mostly got it right because they do still allow you to make bad decisions, like chucking everything into a JSONB column, using UUIDv4 as a PK, etc. Realistically a service would fail if they tried refusing to allow that flexibility, so I get it.

For an in-house solution, though, this can be the case. The old school, rigid mentality of having to ask cranky graybeards to add a column had an extremely powerful benefit: bad decisions were much less likely to occur. Yes, it’s slower, and that’s a good thing. Everywhere I’ve been, bad decisions made in the name of velocity have come calling, and it’s a nightmare to fix them.

In summary, I like the idea of Massdriver quite a bit, actually; I just don’t think it’s a good idea nor accurate to say that it allows devs to be responsible for their own infra, because they largely don’t want that, nor are they capable. Not for lack of intelligence, but lack of experience. Let specialists be specialists. You want a DB? Ask the DB team, and don’t get mad when they tell you your proposed schema is hot garbage. You want more compute? Ask the infra team, and don’t be surprised when they ask you if you’ve profiled your code to determine if you actually need more.

link

dwill-mdcloud 478 days ago

That's a fair assessment of the problem. I appreciate the thought you put into it. I think compiling all of those experts will be difficult for companies until they are fairly large. We hope to push out requiring that expertise for as long as possible. But in an ideal world, I don't disagree with you.

link