|
|
|
|
|
by aguacaterojo
865 days ago
|
|
Very similar story for my team, incl. the 2x cert expiry cluster disasters early on requiring a rebuild. We migrated from Kubespray to kOPs (with almost no deviations from a default install) and it's been quite smooth for 4 or 5 years now. I traded ELK for Clickhouse & we use Fluentbit to relay logs, mostly created by our homegrown opentelemetry-like lib. We still use Helm, Quay & Drone. Software architecture is mostly stateless replicas of ~12x mini services with a primary monolith. DBs etc sit off cluster. Full cluster rebuild and switchover takes about 60min-90min, we do it about 1-2x a year and have 3 developers in a team of 5 that can do it (thanks to good documentation, automation and keeping our use simple). We have a single cloud dev environment, local dev is just running the parts of the system you need to affect. Some tradeoffs and yes burned time to get there, but it's great. |
|