| > Also, anyone in this industry long enough has been around for "Oh, we will just replace that broken piece of hardware" that ended up "WHY IS EVERYTHING ON FIRE?" because versions didn't match up, hardware was rejected I've been doing this for 25 years and I'm not sure what this means. Dell isn't going to come back to you and say "sorry but we can't fix this". With the warranty SLA worst case scenario they'll just replace the entire machine if they have to although I don't remember ever seeing it come to that. > just plain "Actually, THAT failure mode isn't redundant." When it comes down to it similar issues exist with clouds - regions, availability zones, etc. Big clouds have had multiple widespread outages just this year[0]. From that reference you can see that MS and Amazon themselves struggle to design, build, and run solutions for their own products in their own clouds. It's always interesting to see marquee household name companies/products/solutions go down when US-East (or whatever) is having a bad day again. Cloud can be a lot of things but a silver bullet for reliability and uptime isn't one of them. [0] - https://www.forbes.com/sites/emilsayegh/2024/07/31/microsoft... |
Dell/EMC says "Hey, here is drive replacement." We do it, 2 hours later, the volume is knocked offline. Apparently, there was mismatch between backplane version, drive version and through some weird edge case, it knocked the volume offline. Yes, they fixed it, no it wasn't pretty since a bunch of applications had to be recovered.
No, public clouds are not 100% reliable either. It's just their failures tend to be you twiddling your thumbs vs hair on fire on phone with the vendor trying to get it resolved.