| I think there is more to the story for some of these points and it can be dangerous to just take this at face value of best practices. For example on the liveness / readiness probe item, the article says, > “ The other one is to tell if during a pod's life the pod becomes too hot handling too much traffic (or an expensive computation) so that we don't send her more work to do and let her cool down, then the readiness probe succeeds and we start sending in more traffic again.” But this is often a very bad idea and masks long term errors in underprovisioning a service. If the contention of readiness / liveness checks vs real traffic is ever resulting in congestion, you need the failure of the checks to surface it so you can increase resources. If you set things up so this failure won’t surface, like allowing the readiness check to take that pod out of service until the congestion subsides, you’re only hurting yourself by masking the issue. It basically means your readiness check is like a latency exception handler outside the application, very bad idea. The other item that is way more complicated than it seems is the issue about IAM roles / service accounts instead of single shared credentials. In cases where your company has an enterprise security team that creates extremely low-friction tools to generate service account credentials and inject them, then sure, I would agree it’s a best practice to ruthlessly split the credentialing of every application to a shared resource, so you can isolate access and revoking. But if you are on some application team and your company doesn’t have a mature enough security tooling setup managed by a separate security team, this can become a bad idea. It can lead to superlinear growth in secrets management as there will be manual service account creation and credential propagation overhead for every separate application. Non-security engineers will store things in a password manager, copy/paste into some CI/CD tool, embed credentials as ENV permanently in a container, etc., all because they can’t create and maintain the end to end service account credential tools in addition to their job as an application team engineer. It’s something they think about twice per year and need off their plate immediately to move on to other work. Across teams it means you end up with 20 different team-specific ways to cope with rapid growth of service accounts, leading to an even worse security surface area, risk of credential-based outages, omission of important testing because ensuring ability to impersonate the right service account at the right place is too hard, etc. Very often it is a real trade-off to consider that one single service account credential that has just one way to be injected for every service is safer in the bigger picture. Yes it means a credential issue for any service becomes an issue for all, and this is a risk and you want automated tooling to mitigate it, but it very often will be less of a risk than insisting on a parochial best practice of individual service account credentials, resulting in much worse and less auditable secrets workflows overall unless it is completely owned and operated by a central security team in such a way that it doesn’t create any approval delays or workflow friction for application teams. |