Hacker News new | ask | show | jobs
by BenElgar 2221 days ago
Author here. At the last two companies I've worked at we really needed—and didn't have—a solution for spinning up throwaway Kubernetes clusters that we could use for testing and development. Krucible is an attempt to solve that problem.

We've just released a really cool feature called Snapshots that allows you to image a running Kubernetes cluster, including the state of all applications, and then create new clusters from that image. It's great for creating consistent development environments or quick starting test environments.

Happy to answer any questions people might have.

4 comments

> Krucible is an attempt to solve that problem.

For me, this was always ops smell - why do devs need to spin up k8s clusters? As long as you're not working on some low-level k8s features (your own operator, or testing cluster-wide resources, or developing k8s components themselves), then why not use a 'real' cluster for testing? k8s multitenant/process isolation is definitely good enough for semi-trusted users like developers, as long as you take sensible measures (ephemeral low-priv namespaces, podsecuritypolicies, networkpolicies, quotas, etc).

From what I've seen reviewing k8s clusters some of the items in your "sensible measures" list aren't considered easy to manage and deploy.

In particular good RBAC design, that doesn't end up leaking information across namespaces, PSPs that are flexible enough for developers but strict enough to prevent privesc and strong network policies present challenges.

For those, less mature, organizations, a solution like this might present an easier option.

I don't want to gatekeep, but in my opinion organizations that can't afford to set this up correctly _likely_ shouldn't be running k8s in the first place.

From my experience with companies that haven't done their organizational or engineering homework: half-assedly deploying Kubernetes ends generally ends up being an unmaintainable disaster.

One of the high-return-value aspects of k8s is having little clusters available to multiple tenants. Without this in place k8s really stops making, being too complex for its actual usecase - so you might be much better of using something simple like Nomad.

Whether they (the user) should (run Kubernetes) or not, there are many systems and products out there that are meaning to make Kubernetes as accessible as possible.

In fact, the goal of many solutions (GKE, AKS, EKS, etc) is meant to be "We managed the entire cluster for you, just deploy your workloads!".

In many situations, if a company is running a single application in their cluster, many of the these management aspects (networkpolicies, quotas, etc) are not at all necessary for their use-case.

You say they shouldn't be running k8s in the first place, and I half agree with you. They don't _need_ to be running k8s. Large platforms have done a lot of work to make "Run in a Kubernetes Cluster" as approachable as "Run in Heroku".

Regarding Nomad, sure, but if someone hasn't done their engineering homework, the chance that they are even familiar with Nomad is slim (no offense to Nomad)

Edit: A bit of clarity in the first sentence

this this this.

If you have to pay someone else for the k8 and cannot do some management yourself then stop. You do not need it to begin with.

Nothing against the original posters idea/company. Looks like a good idea to me.

> k8s multitenant/process isolation is definitely good enough for semi-trusted users like developers

I think it's reasonable for everyone to have their own cluster if their day-to-day work involves developing cluster-scoped resources. Admission webhooks, service meshes, CNI plugins, etc.

I agree that most people have too many clusters, using them to separate batch and interactive jobs or staging and production jobs. Generally, computers are expensive and manually scheduling jobs to clusters reduces utilization, so it costs a lot of money. It also gives you less operational flexibility, like being able to throttle batch jobs to scale up interactive jobs.

Really it depends what you're testing and deploying. For instance, if you're developing a new microservice, it's helpful to be working with the same service discovery mechanism that is running in your production cluster. When you're testing, presumably you also want to ideally test any Kubernetes changes that you make before you deploy to your production servers. Krucible is ideal for that.
But again, why not just deploy your microservice to a development namespace on a shared cluster? It's going to be the closest thing to production, with hopefully only a few flags changing.
By having a dedicated Kuberntes cluster you reduce the blast radius. For instance, if you roll out a new version of your services that unexpected consumes a significant quantity of resources, if you were running that on your production cluster that could interfere with your production workload.

In a similar vein, if you have 10 or even 100 end-to-end test suites, with Krucible you could run them all in parallel, significantly reducing the time taken, without fear of them impacting each other. In your shared cluster scenario you would be limited by the size of your cluster.

> By having a dedicated Kuberntes cluster you reduce the blast radius.

Kubernetes supports resource requests and resources quotas to combat this. You should be protecting your production workloads this way anyway.

> In your shared cluster scenario you would be limited by the size of your cluster.

On the other, with a shared cluster, it makes sense to dedicate more resources to it, and share it across both developers and CI systems.

> Kubernetes supports resource requests and resources quotas to combat this. You should be protecting your production workloads this way anyway.

That's certainly good advice and would significantly reduce the likelihood of issues but it doesn't handle all cases. For instance it's not particularly easy to quota network bandwidth.

Ultimately all of these problems are likely solvable—we just think that Krucible is easier, simpler and safer.

There are certain things that just can't be tested while scoped to a namespace; many CRDs are a good example of that. If your service rollout contains a CRD, an it is rolled out to the same cluster as production, you are going to impact production.
How do snapshots work? Is it simply taking snapshots of the underlying VM's or are you doing it at a Kubernetes resource level?

Also, can you describe the performance (cpu / memory) of the clusters? Obviously if I am running lots of pods that have:

  resources:
    requests:
      memory: 512Mi
      cpu: 500m
Allocated, could run into problems if the underlying servers don't have enough cpu / memory.
What's the advantage of this over Minikube?
Krucible is hosted, so you're not running a VM that's consuming resources and that you have to manage—spinning up Minikube from within a CI environment isn't necessarily the easiest of tasks. With Krucible it's just a single API call to create a Kubernetes cluster. This also allows you to parallelise your test suite easily as you can spin up as many clusters as you need at the same time.

The snapshots feature is also a big differentiator: you can set up a cluster, take a snapshot and then share that with your team so that you're all running identical Kubernetes clusters.

How do you think this compares with Kind, which seems to be the most popular way to spin up throwaway clusters?