Hacker News new | ask | show | jobs
by philipstorry 103 days ago
Not a bad article - thanks!

Others are pointing out that you cannot understand everything - and that's true enough.

But you only need to understand what's important. The experience of a good expert helps you to find that out.

As a systems administrator the recent AWS outage in the Middle East is the best recent example. There will be roughly three types of companies, separated by their understanding:

- Don't Understand - these companies thought that the cloud would handle this kind of thing for them, and are probably going to be doing a lot of finger-pointing in the near future.

- Do Understand, Don't Care - these companies did understand that high availability meant going multi-region, but decided against it for whatever reason. Probably cost vs perceived likelihood. These companies know that they've made a mistake. Short term they're wondering how to survive it, long term they'll be re-assessing their risk acceptance. Many may decide to stay single-region, but at least understand why.

- Do Understand, Do Care - these companies will simply be checking that their procedures worked for any manual parts of their failover, plus possibly looking at any improvements they can make given the real-life experience they've gained.

An LLM is just going to tell you how to implement it. It's not going to be thinking "what sort of availability do we require?", it may never start that conversation unless explicitly prompted. And even then it's going to return consensus opinions, which may not be what you want when evaluating risk.

I'd love to think a lot of companies will be looking at this event and updating their own risk register or justifying their existing risk decisions for hosting. But let's be honest - most won't even have thought about it, and won't until it goes wrong.