Hacker News new | ask | show | jobs
by rjst01 814 days ago
A year ago I left my job at a 500-employee SaaS business working on the team that maintains the devops infrastructure, to found a startup. For me the biggest pain point is going from nothing to a sufficiently flexible devops setup.

There are a lot of great tools out there, but making them play well together is an exercise for the reader. There are also a lot of preference-based choices you need to make in how you want your setup to look, and what you chose will affect what tools make sense to you.

Do you go monorepo or polyrepo? If you go monorepo, how do you decide what to build and deploy on each merge? If you go polyrepo, how do you keep stuff in sync between any code you want to share?

Once a build is complete, how do you trigger a deployment? How does your CI system integrate with your deployment system, or is the answer "with some shell scripts you have to write"?

> How do you deploy resources?

For us, we have a monorepo setup with bazel. I wrote some fairly primitive scripts to scan git changes to decide what to build. We use Buildkite for CI, which triggers rollouts to kubernetes with ArgoCD. I had to do a non-trivial amount of work to tie all this together, but it's been fairly robust and has only needed a minimal amount of care and feeding.

> How do you define architecture?

Kubernetes charts for our services are in git, but there's some amount of extra stuff deployed (ingress controller, for example) that is documented in a text file

> How do you manage your environments

We don't need to deploy environments super often, so just do it manually and update documentation in the process if any variations are needed.

> observability

Datadog and sumologic.

Overall our setup doesn't come close to the setup I worked on at my last employer, but I have to balance time spent on devops infra with time spent on the product, and that setup took ~5 full time engineers to maintain.

2 comments

> but there's some amount of extra stuff deployed (ingress controller, for example) that is documented in a text file

Out of curiosity, why just the "readmeware" for those components? I can't think of a single thing that requires clickops in a modern k8s setup, so much so that in the beginning we used to bring up the full stack from nothing based on a single CFN template - roles, load balancer, auto-scaling group, control plane, csi driver (this was back when EKS was a raging tire fire), and then lay the actual business apps on it. The whole process took about 8 minutes from go

If nothing else, one will want to be cautious about readmeware components in disaster recovery situations. If no one has run those steps in 6 months, and then there's some kind of "all hands on deck," the stress will likely make that institutional knowledge leak out of their ears

> Out of curiosity, why just the "readmeware" for those components?

Because there are so few of them. Our setup has an ingress controller and a certificate manager, and then some bookkeeping like copying the container registry credentials into every namespace

> I can't think of a single thing that requires clickops in a modern k8s setup

Absolutely agree.

> The whole process took about 8 minutes from go

How long to do the development and testing of the template, and what size is your team?

Don't get me wrong, I'm not happy about this situation. As well as the DR concern you raise, we can't quickly spin up short lived clones of our infra for testing complex changes, so we test them in our staging environment and have to block prod deploys until we're either happy with the change or decide to roll it back. At a larger org this would be a major headache but at our current size it does not matter.

> We don't need to deploy environments super often, so just do it manually and update documentation in the process if any variations are needed.

So you don't need to deploy multiple times or you don't do it because the system is stable when you deploy less often? I mean is it by choice or because of some tool or expertise limitation?

> Also, for architecture stored in text files - does that cause any problems for you?

We deploy our services on every merge (multiple times a day) from CI assuming the tests pass. But we don’t re-create our Kubernetes infra when we do that. I set up 3 Kubernetes clusters (staging, prod, internal apps) when I went full-time and have barely touched them other than to apply updates.

> does that cause any problems for you?

Nope. Our Kubernetes setup is just about as simple as it is possible to have a Kubernetes setup. I entertained the idea of going with something else, because we definitely don’t need all Kubernetes has to offer. But I settled on it because it’s what I know best and the overhead+risk of something new would have exceeded the cost of the unnecessary for us baggage that Kubernetes brings.

If our requirements were different or if I was making regular changes, we would be in a very different spot. But as it stands today it is just not a priority.