Hacker News new | ask | show | jobs
by Yizahi 2137 days ago
One reason for example can be providing better reliability to the customers who were using certain application on premises and need/want to continue doing so. I've observed how our solution switched from monolithic blob of cpp code which crashed everything upon major failures; to the several modules - now your monitoring may crash and restart but hopefully service won't be interrupted, but when performance part crashed - all service stopped, and maintenance time was long; to the k8s - when all parts are split into separate containers and performance part is split into smaller chunks completely redundant, so when one crashes it is a) restarted without affecting 99% of other users, b) it is restarted much quicker, ten times quicker, meaning less down time to 1% affected users.

But k8s introduced non trivial amount of complexity and its own bugs and maintenance cost, meaning new separate engineers to maintain and develop just k8s tooling. And we had to rewrite a lot of legacy code. But the trade off is much better for a big project. Cutting downtimes by an order of magnitude and being able to boast it to the board - apparently priceless :) .