Hacker News new | ask | show | jobs
by TankeJosh 420 days ago
100% agree with everything you’re saying here. This tool is designed to make the infra / DevOps person more effective by spending less time on tedious tasks and instead focus on the high level architecture and cost of their infrastructure. The deployment feature is mainly for testing any configurations in dev / staging environments before integrating with GitOps.

The workflow we imagine is an infra team either managing everything on their own, or providing private Terraform modules that the rest of their developers can ask the agent to configure for them. For teams that go this developer self-service route, you can also set custom validation rules and default configurations that are shared across your team.

Looking back on our launch post we definitely did not highlight who this product was built for well enough. I’d love to hear if there’s any aspects of the tool that you think would be helpful for your work!

1 comments

None that I can think of. Maybe LLM that knows Terraform would help me out BUT I'm not sure I'd pay for it because most Ops people don't spend their entire life writing Terraform. They write few modules, publish them, update them occasionally and that's it.

Most of mine is hard political stuff which is to say getting Devs/Leadership to give a passing care about Ops. Outside FAANG/FAANG types, most companies are fine with Devs going "Light is green, trap is clean" and not caring about containment field (Ops). Paging me out at 2AM is not something Devs can get in trouble for.

This is common thing I see with most FAANG founders. People coming from Google Operations think, ok, it's probably 10x worse than horrors I saw. No, it's much worse.

Maybe we should add a special agent mode to help with planning internal politics strategy ;)

I am curious what the handoff looks like between you and the devs you work with. Do they self-serve using the modules you publish? Or is there some sort of dev portal that abstracts away Terraform?

For Infrastructure, we are moving to single monorepo where they PR into the repo and GHA runs it. Most of time, they are using Modules we wrote but we don't run alot of cloud native stuff. Most of it is Database, possibly Redis, maybe storage, occasionally Pub/Sub. Modules are supposed to load up rules and forward it to the teams in Pagerduty but that doesn't always happen which fall through to us. I'd say most teams infrastructure changes only 1-2x year except when a new project is getting spun up.

Application can be put into a container and tossed into Kubernetes. We use Kustomize + Templates for most of applications but occasionally those will need to be modified. I'd say that happens once a week.

Other option is ungodly Chef setup that will deploy their applications from Jenkins. We actually package their system up in .deb package that is pushed to subset of boxes that is absolutely nightmare I luckily don't have to deal with. We went full "Write your own Kubernetes" never go full "Write your own Kubernetes" (https://www.macchaffee.com/blog/2024/you-have-built-a-kubern... NOT THE AUTHOR)

Hand off is Container or "My application builds on Jenkins." If everything is running normally, there is nothing to hand off. It's when it's not, I get paged and lack of hand off becomes frustrating. We are also isolated to our group.

Thanks for sharing, this is really helpful info!

In its current form, infra.new would probably be most helpful when setting up new projects or migrating any old apps to this single monorepo setup, but it also sounds like Terraform isn't a huge pain point for your team.

I am interested to learn if we can help with these 2am pages though. Are those set up by you? Or the developers? Would an agent that helps improve observability / alerts configuration be interesting to you?

>I am interested to learn if we can help with these 2am pages though. Are those set up by you? Or the developers?

Could be me or developers. Sometimes, it's my infrastructure acting up, thanks Azure for that failed Kubernetes upgrade. Or it could be Dev Team ran into something and paged out Ops team because A) Maybe it's infrastructure. B) Ops teams tend to have best troubleshooters, something in our Ops DNA. C) They can and their managers never want to explain "Well, we found it was DNS but because Ops was not on the call, it took 15 minutes for us to wake them up." D) They likely need our support to run this one-off Kubernetes Job or rush out deployment or other such thing.

> Would an agent that helps improve observability / alerts configuration be interesting to you?

That's what Datadog has sold us already (I'm not impressed) so it's a crowded marketplace. ;) I'm personally not in the marketplace for anything so I'm not potential customer. If you were looking for another pivot, please for the love that is holy, have it plug into Prometheus (PromQL) natively. If I have to setup another beeping sidecar to deal with logs and metrics, I'm going to hurt someone. Also, logs hooked to some LLM/AI is terrible idea, don't even think about it.