Kelsey Hightower had a good take in https://twitter.com/kelseyhightower/status/12963119515307048... "AWS Controllers for Kubernetes is pretty dope. You can leverage Kubernetes to manage AWS resources such as API gateways and S3 buckets. Think Terraform but backed by Kubernetes style APIs and "realtime" control loops."
Every Kubernetes shop I've seen uses Helm to deploy their workloads, & now you can set up all your S3 Buckets, SQS, SNS stuff in that same Helm chart as your app. This is hella ideal.
I also just generally love the thought of being able to manage any cloud resources at all via standard open-protocols & systems. Compounding the investment, rather than having to invest in a bunch of specific not-interconnected areas is going to lead to great things.
It'll be interesting to see what if any architectural flourishes or innovations went in to ACK's control loops.
Deletion is typically specified by an immutable finalizer on the resource. For example, you can create a StorageClass that saves your PersistentVolume when you delete the PersistentVolumeClaim, and you can't change the class of a claim, so even if your templating goes haywire and deletes your volume claim, the actual data is safe and can be recovered. I have to imagine that any API that lets you create things like RDS instances would have similar protection. (You might think that the k8s world is very mutable, but there are some immutable things hanging around. For cases exactly like this.)
I don't use AWS so I didn't look into this thoroughly, but the word "finalizer" does show up in the code frequently:
// MarkManaged places the supplied resource under the management of ACK.
// What this typically means is that the resource manager will decorate the
// underlying custom resource (CR) with a finalizer that indicates ACK is
// managing the resource and the underlying CR may not be deleted until ACK
// is finished cleaning up any backend AWS service resources associated
// with the CR.
I dunno how good it is, but clearly they've thought about it. Test it before you invest billions of dollars into it, though.
Finalizers dont do much for safety. They are simply there to ensure controller (in this case ACK) won’t miss the deletion event and leave the resource dangling. To actually prevent object from being deleted you need a validation webhook
An interesting part of the Operator Lifecycle Manager (OLM) is the capability to scaffold webhooks for Operators, rotate their secrets, etc to make this type protection easy for everyone to provide. All you need to do is bring your validation logic specific to the app.
This is, in my opinion, the most dangerous thing about CRDs, especially for resources that are not scoped the the cluster.
It is way to easy to accidently delete something you need. PV's were my first experience dealing with this (a chart upgrade recreated a PVC, the old one unbound, and was immediately cleaned up), and it's not a risk that I want to see extended to buckets, RDS instances, etc.
The other side of this is that CRDs can lead to abandoned resources; if you find your cluster borked, or shut down improperly, any resources which existed as CRDs (or in cloud Kubernetes land, include LoadBalancers) probably did not get cleaned up, and will be abandoned (but left running).
It's not clear that there is actually a good solution here that fits neatly in with existing CRD behaviour.
If you've used helm for more than one day you'd know about the helm.sh/resource-policy: keep,
Which makes your first big "accidentally deleted" concern pretty near moot, so long as it's your ci/CD tools with permissions & they are using helm. So long as random cluster users aren't futzing around randomly poking at & deleting things.
Yes, you have to go way further & prevent your idiot users from being dumb, if you are giving them cluster role permissions. But again, have you considered not doing that?
Abandoned resources is no less of an issue. If there are external resources that don't have state (such as your Load balancer examp, the good news is you can delete them all & let the resource controllers recreate the real ones.
In general I feel like you are letting the 1% of concerns dominate & dissuade you, & that most people can make it very far with nothing extra.
I'm not sure why you're popping in to a thread from a week ago, starting with personal insinuations against my qualifications to have an opinion, and being a generally uncivil person.
You should try to make your points without the condescending attitude.
The entire idea of your infrastructure not being tied to your provider is completely lost when you are at any type of scale. I have a feeling that anyone who thinks it is easy or even usually worth it has never been part of planning a large scale migration.
At the same time, you usually end up spending more money and having worse results when you don’t go all in.
As far as why use EKS vs ECS - the “native service”? They seem to have feature parity, ECS is easier to use for the unitiated. But, there are so many people who know k8s and your knowledge is portable.
Which brings up my second point. Most software engineers don’t care about cloud mobility as much as they claim. They care about career mobility. There is a much better chance that you will leave a company and move to a company on a different provider than your company will. I’m not saying it’s a bad thing to focus on technologies that give you as an individual the most optionality.
> The entire idea of your infrastructure not being tied to your provider is completely lost when you are at any type of scale
Pretty categorical statement and I don’t find this to be true at all if you avoid using anything managed except block devices and vms themselves. That is unless “any type or scale” is tens of thousands of VMs and petabytes of storage in which case it is indeed a moot endeavor since someone who is still on public cloud at that point apparently likes to give all their money to cloud vendors anyway.
If all you’re doing is using a cloud provider to host a bunch of VMs, you’ve already lost the plot. Once you use any cloud provider as a glorified colo without changing any of your processes you’re already spending more than a colo and not getting any of the benefits of managed services.
Have you ever budgeted a project plan involving a large migration.
> Once you use any cloud provider as a glorified colo without changing any of your processes you’re already spending more than a colo and not getting any of the benefits of managed services.
So you’re saying running multi regional, burstable (metered by minutes or even seconds) VMs with redundant power and networking and without having to hire a bunch of maintenance staff is not one of the benefits? Imho it’s perfectly economical at small to medium scale.
> Have you ever budgeted a project plan involving a large migration
I’m doing one right about now. If you design your stack to be multi-regional from the get go and dont get yourself into aforementioned cloudlock it’s a simple matter of bringing up new kubernetes clusters in a new cloud region and possibly setting up some peering.
“Managed services” in cloud are a Faustian bargain.
It’s also totally untrue that you need to adopt any specific proprietary features of public cloud to change your IT processes to break the ITIL straightjacket into a sane process. It’s just an abstraction, and there are plenty to choose from that work with whatever you already have.
First of all, they’re not actually managed services as we typically expect. The support experience is much more like a “hosted service”, with very circumscribed boundaries around feature scope, mostly because it is managed entirely by the kind robots rather than having humans actually manage anything.
Secondly, they’re lock in for debatable benefit.
Third, they’re almost always the way the clouds make their margin and you’re stuck with their design decisions, and have little control plane adjustability. Running Google Cloud SQL is expensive and crappy, I would much rather run my own DB on a Kubernetes cluster via a mature and commercially supported Kubernetes operator. Because it’s automated, reliable, and if I need to, I can get into the weeds.
Because the dirty secret is that all of the managed cloud services are just software, no different from what you install, they’ve just closed source certain components to make it seem magical. And yes you can file a ticket to get them to fix it, but remember the law of outsourced availability: you still own your SLO to your customer, you still own your end to end availability. All the “managed cloud” will do is cut you a check for the wasted pennies on a bill, while your customers leave and business tanks.
If you treat cloud like a commodity player that you can fire (not likely - more “shrink investments in favor of an alternative”) every few years if their quality goes down or price goes up, you’re in a much better position for cost, performance, and reliability. Same as it always was. The “one Throat to choke” theory of outsourcing was a always business school bullshit line that never really worked, and that now has been adopted by tech experts that want to ensure their cloud-specific skills are always relevant, regardless of whether these all proprietary features are really a good idea.
Speed + having no idea what you’re doing = use all the managed services!, this seems to be the cloud pundit elevator pitch. but as with anything, if you don’t know what you’re doing, at some point you will be caught being a sucker.
The multi-cloud, open source ecosystem continues to grow over many proprietary cloud services for these reasons.
You act as if people to manage servers are free. How much time are you spending to avoid “cloud lock-in”? How much money are you spending on the extra support staff or are you making your developers do double duty? Is doing the grunt work of managing servers and services helping you win against your competitors? Is it helping you to go to market faster?
I’m not just speaking about cloud providers. Do you know how many healthcare systems are so tightly “locked in” to their EMR/EHR systems that it would make you cry?
This is just one small step in that direction. Once everyone is on the same page about Infrastucture-as-Data and behaves in a consistent and expected manner, we can move forward with wrapping these interface. Crossplane is a one such project.
Is there something like CloudFormation console for EKS/ACK?
last time I deleted a cluster it failed because of a NLB still being around but not accounted for as CFN resource, even though I provisioned the cluster with CFN.
I also just generally love the thought of being able to manage any cloud resources at all via standard open-protocols & systems. Compounding the investment, rather than having to invest in a bunch of specific not-interconnected areas is going to lead to great things.
It'll be interesting to see what if any architectural flourishes or innovations went in to ACK's control loops.