|
|
|
|
|
by residualmind
1225 days ago
|
|
One thing I've noticed that is sometimes forgotten, especially at earlier stages is monitoring. You want to know how much self healing is actually happening. Let's say you have your self-healing system in place, say some k8s pods combined in a service with a little redundancy and very little state. Pods happily crash, another one takes over while a new one spins up. All is wonderful and you don't worry about your availability anymore because everything just always works. One day you decide to look into whats happening in your containers and are shocked because one pod crashes every 0.3 seconds. It just spins up, answers 1 request but then dies and a new one spins up...continuously. From the outside everything looks kind of ok but in reality you are wasting massive resources and have a nasty bug that might be losing you even data, consistency, creating load, etc...
Some sort of monitoring is a good idea is what I'm saying. |
|
But the nice thing about using an already resilient system like K8S is that pod crashes won't cause your customers to not be able to work and you can fix the issue in the background instead of having to throw up a status page and fix the problem immediately.
It's better to have a problem that your customers don't notice because it buys you time to figure out the issue.