|
|
|
|
|
by hga
3945 days ago
|
|
The Danger team did built a great system for its time and by the time Microsoft took over, it went downhill fast. Microsoft managed one of the biggest cloud computing screw ups in history to date: https://en.wikipedia.org/wiki/2009_Sidekick_data_loss The incident caused a public loss of confidence in the concept of cloud computing, which had been plagued by a series of outages and data losses in 2009. It also was problematic for Microsoft, which at the time was trying to convince corporate clients to use its cloud computing services, such as Azure and My Phone. I've heard good things about Oracle's RAC, but it's understandably intolerant of your screwing up its disks (SAN mis/re-configuring) when you aren't properly maintaining backups. I also heard the consultants you have to hire after you manage such a feat are expensive. |
|
There are a number of problems with RAC, some of which are people using it wrong, and some of which are inherent to RAC. "Using it wrong" covers things like people not understanding it's on shared storage so it's providing compute node resilience, not storage resilience, so they probably sould spend on some Dataguard (or equivalent) unless they want to be the DBA equivalent of the server admin who thinks you don't need backup because you've got RAID.
The built-in problems come from the fact Oracle ASM doesn't check[1] the signatures on disks/LUNS presented to it. So if the SAN admin, I don't know, manages to somehow reverse the mappings for one LUN of 30 between the stress RAC and the dev RAC, Oracle will not start and say "that ASM disk has the stress signature on it"; Oracle will overwrite the stress LUN with dev data for a while, then go to read it, then discover it doesn't have the on-disk structure it expects, then crash with a SEGV or other entertaining but unhelpful error. But only after it's irretrvably corrupted the ASM group, of course.
[1] as of 10g, the last time I hit this problem.