Hacker News new | ask | show | jobs
by Apocalypse_666 2468 days ago
I wrote this whole thing below as a reply to someone stating that I should just stop complaining and figure it out, but that comment has since been deleted. Figured sharing my frustration might be cathartic anyways :)

If only I could!! That’s exactly the frustrating part: there seems to be no way of grokking what goes on under the hood, and there are so many different ways of setting up a cluster and very few have any information about them online whatsoever.

As a practical example, what happened yesterday was that all of a sudden my pods could no longer resolve DNS lookups (took a while to figure out that that was what was going on, no fun when all your sites are down and customers are on the phone). Logging into the nodes, we found out about half of them had iptables disabled (but still worked somehow?). You try to figure out what’s going on, but there’s about 12 containers running in tandem to enable networking in the first place (what’s Calico again? KubeDNS? CoreDNS? I set it up a year ago, can’t remember now...) and no avail in Googling, because your setup is unique and nobody else was harebrained enough to set up their own cluster and blog about it. Commence the random sequence of commands I’ll never remember until by some miracle things seems to fix themselves. Now it’s just waiting for this to happen again, and being not one step closer to fixing it

4 comments

Well that sounds like a problem of "I run my own cloud". And not a problem of Kubernetes. You dont remember how you set it up, oh well.

If you use a managed kubernetes (not in aws since they suck, eks is not really managed). Like gke or aks, then you skip the whole "there is a problem in my own cloud of my own making".

btw, I also encountered DNS problems in kubernetes, on ACS, it took 5-10 minutes to resolve, and was caused by ACS not having services enabled to restart dns upon reboot, lol.

The whole promise of Kubernetes (and, coincidentally, containers) is that you are not locked to a platform/provider/OS.

Reading this comment made me realise that often new technology is adopted because it is optional and promises options. But those options quickly shrink away and suddenly you’re locked into it.

Not to invoke a controversial name. But this is what happened with systemd.

Yes! You are not locked to a platform/provider/OS. Your GKE cluster operates more or less like your EKS cluster operates more or less like your cluster in Azure, DigitalOcean, etc. Kubernetes is a deployment platform you target and complaining that the things two layers under the hood of that are different is a false analogy.

Moving from one Kubernetes Provider to another is not zero time. You need to learn some differences in the way GKE ingresses vs AWS ELBs work, etc. It is a substantially more tractable problem than the differences between Cloud Bigtable and DynamoDB, and that one is still a tractable problem.

The way to fight lock-in is is not, and has never been, "These two providers offer exactly the same service". It has been about avoiding "These two providers offer nothing that is analogous, and their documentation is directly written to encourage using practices that do not port". It has never been an all-or-nothing thing.

Would you be willing to tell what kind of business you are in, where you have few enough customers that they can reach you by phone, but still need such a large number of machines that you need kubernetes ?
There's the rub: we don't actually need Kubernetes at all! Just a case of resume-driven development by a predecessor
To me it sounds like the problem wasn't kubernetes, the problem was that you (your predecessor) rolled your own instead of going down the path of using something with a support number. Redhat obviously comes to mind first but there are countless options with enterprise support included.

Have you considered rebuilding/moving the containers onto something more "enterprisey"?

A 'support number' does not solve the technical issues with kubernetes, it just kicks the responsibility down the line so that someone else has to deal with it.
A "support number" gets you access to an expert. No offense to OP, but it doesn't sound like he's got a deep knowledge of the internals of Kubernetes. Which is fine, because it also sounds like that's not his main job, just something he's tasked with taking care of. That's literally why enterprise support exists. If we all had infinite time and brainpower to be experts in everything, we could just roll-our-own. But we don't. Which is why AWS exists, and IBM bought Redhat for billions of dollars.
Now that the myth of experts have been accurately described, let's say a few words about the reality.

If IBM wanted the experts, they would have hired or grown their own. What they wanted was, I guess, the contacts (actual and prospectives).

My experience with support contacts is that you often times have access only to someone who is not any more expert than you, and who care much less for your customers than you do. In several occasions it turned out the "expert" had been the one benefiting from the teaching from the in house "not supposed to be expert but knows more than the expert" guy (and yes also, and maybe especially, with "reputable" large companies like IBM or Oracle).

I can even remember of a particular instance when the expert had no access to his company internal documentation to get details about a specific error message we were hitting, and we had to find a pirate copy of some internal manual from some Russian website and hand it over to him.

It makes sense to have a service contract when you have really no knowledge at all on the domain, but as soon as it's related to your daily job then you will quickly realise that experts are mythical characters whom your contractor have no better access to than any other company, including your own.

That's practically 99% of the time really. Even in enterprise scenarios Kubernetes is very, very rarely required.

I've yet to come across a single instance where such rapid scaling happens and stays consistently high.

Most of the time you know well in advance when your resources will be put to the test.

I’m not trying to defend kubernetes complexity or saying it should be used for deployment of all server-side software, but I’d push back on the oft-repeated idea that kubernetes is just solving for scale. It’s a model of deployment that happens to make scaling easy, but the model has many advantages other than scale.
Resume-driven development is unfortunate; but a valid strategy when employers who run everything on one server that hasn't been updated since 2003 start listing 5 years of Kubernetes experience as a requirement in their job posting.
There's the start-up kind - "We need to be able to scale! We will potentially need to process billions of users!" ...waiting...
I love Kubernetes and think it solves my problems very well. Although I think I have problems that generally require it due to scale and failover.

But I have also had a number of DNS problems that we still haven't resolved, and they sometimes go away on their own. Same for IP tables rules issues. This is of course on a hosted kubernetes cluster at a large supercomputing center. (I didn't set it up, I just have to fix it. Ugh.) At Google, it's been great and we've had no networking problems, but they almost certainly run their own overlay network driver.

The various networking solutions you can plug into kubernetes seem pretty spotty, and they are very hard to debug. I still haven't figured it out myself. But I am trying to not throw the baby away with the bathwater. I think the networking (and storage) parts will get better.

I feel for your troubles. Letting you know you can move it into EKS or Google cloud could probably save you a lot of headaches in the long run.