|
|
|
|
|
by dom_hutton
1097 days ago
|
|
If I could have my time again I'd focus on enablement of the broader engineering group through an incident management process, readiness exercises, some aggregate incident analyses for the org to learn from and leaning into observability. Infrastructure is kind of a solved problem for common use cases today, just requires the expertise. With our ducks in a row I'd next have look to a GRC function for the compliance bits whilst splitting the platform engineers time between embedding engagements and tooling investments. You're on the right path man, I'd love to know what I know now back then but unfortunately time doesn't work like that. |
|
This is the problem however for many (older) companies. They either don't care, or quite literally don't know the infrastructure solutions out there which can save literally thousands of hours per year of headache. Sure, for many companies with legacy systems they have a "dont fix what isn't broken" mindset, but from what I've seen, I always ask, if shipping and modifying new versions of a system takes hours or even days to complete, is the system really not 'broken'? I guess I never realized it, but having automated and clean infrastructure with tests and uptime metrics is a must-have for me on anything I build going forward. Take 2-3 weeks to save months of headache.