|
|
|
|
|
by markbnj
2729 days ago
|
|
I'm wary of the operator model in general, and we haven't had great success using operators to deploy complex stateful services in our clusters. But to be honest we also haven't had great success deploying them using OTS charts from helm stable either. One of our k8s stateful services is a large elasticsearch cluster indexing about 150m events per day, and the chart was forked and heavily modified by us to get it right. I feel that complex stateful services often have enough devils in the details that trying to implement them through an abstraction gets you into trouble. Operators aspire to be a "smart agent" that can translate a CRD resource declaration into a functioning thing, allowing you to implement your data store at an even higher level of abstraction than a helm chart provides. Since in my experience charts are themselves too abstract for this purpose (you either end up forking/modifying or, if the chart actually provides full coverage of the configuration options, creating a whole new hard to comprehend API to the k8s resources you're trying to deploy), I'm not that excited about having a back-end clippie that can do it for us. It's probably fine for simple use cases, and especially those where you often need to create and destroy simple dbs, but imo not yet for large production use cases. |
|
Unfortunately, we aren’t there yet for most software. Let’s take Postgres as an example. Even though you have to manage your pg database manually (or use a service that manages it for you), that’s just because the right automation software hasn’t been built yet. Someday, a Kubernetes Operator (or equivalent implementation) will exist that can manage a large Postgres cluster better than a team of DBAs. It’s crazy that there are hundreds (thousands?) of configuration parameters in Postgres, and these are coupled to the operating system settings in weird and unexpected ways that most people don’t know. We should be building this knowledge into a K8s Operator and letting that control our pg.conf and os configuration, instead of giving that control over to a team of humans who might be able to put in some sane defaults, but will always be working to get the optimal performance out of Postgres as the usage share changes.
This exists in some places already. For example, Rook is a K8s operator that provisions and manages Ceph in a Kubernetes cluster. As a small startup, if I need this functionality, I don't want to hire a full time Ceph admin to figure it out, and I don’t have the expertise to take on operating Ceph myself. Rook productized operating Ceph for us, and “baked in” all of the needed knowledge to manage block and object store and even set up concurrent, shared file systems. I trust Rook to manage Ceph, and I don’t think that I could do a better job with human intervention.
We have a long way to go. Operators are a tool that might help get us there but Operators are just a pattern that exists that we can use. One thing for sure is that we shouldn’t assume that human control over complex software is required to achieve optimal performance.