Somehow back when Amazon had to reboot every EC2 server to fix a Xen bug, my "insecure" hypervisor-less server didn't require such action. I think I'll prefer to keep sailing without that particular bulkhead.
The reboots were because Amazon cuts a lot of corners in their Xen setup because instances are supposed to be "disposable". A proper VM cluster would use dedicated storage nodes that export over something like iscsi, which would require just transferring a memory snapshot, or would use the native, slower, disk snapshotting migration. But that's just one of many ways AWS is rather broken from an operations standpoint.
What you call a proper VM cluster has been pretty terrible from my experience as both a developer on an early cloud and a consumer of large government facing clouds.
Typically the network or SAN becomes oversaturated and the vm's shit the bed. AWS on the other hand was considerably more reliable and I'd argue they've made better decisions rather than cut corners.
The SAN becoming over-saturated isn't something that just "happens". Between establishing limits ahead of time and monitoring that shouldn't happen without someone knowing well ahead of time.
Just to be clear, I'm defending Amazon over an accusation that they've cut corners from an install that was up in 2007 or earlier. Now we could have focused on xen guests shitting the bed for no apparent reason or flakey switch port but we decided to focus on storage.
OK so I'm not proclaiming to be an expert but as someone working in the area in 2007, buying something off the shelf like it aint no thing, you're getting something like a netapp with a limit of 512 iscsi inititatiors, or a Sun amber road where your only form of automation is an ssh consol with a big warning stating it's unsupported.
From memory, there was no such thing as setting quotas on the amount of IOPs an iSCSI inititatior can do, in fact, I'm fairly sure IOPs quotas just didn't exist period, as the vendors weren't really up-to-speed with this new selling vm's thing. So basically, we're suggesting that it's a good idea to just buy a SAN to run an indetermined amount of vm's that are going to do an indetermined amount of IOPs.
OK, cool, you're now indebted to storage vendors selling you new shelvs at £80,000 a pop for those extra IOPs you so deperately need. Now to be fair, Amazon could probably afford it, but your VMs would still be a lot more expensive and would probably have still been totally disposible when your switch port decides to blip traffic to the SAN or as previously stated, Xen shits the bed.
None of these things might be a problem today, I don't know, I'm more a consumer than a producer of clouds these days, but I'd suggests these criticisms are bullshit. They come from some obviously smart people, but bullshit none-the-less.
Well true; even the RMS Titanic, with its fifteen bulkheads, was no match against an iceberg. Let's just hope that this new hypervisor for OpenBSD is built with better-quality steel :)
Also, it's worth mentioning that EC2 instances are meant to be ephemeral; Amazon doesn't provide any semblance of a guarantee that your instances won't reboot, and assumes that you intend for all your "machines" to be arbitrarily rebootable. Not saying that any hypervisor implementation right now is particularly good; only that Amazon isn't exactly the best representation here.
How many bugs have OpenBSD team found vs found in Xen? That would be a relevant comparison. From there, an assessment of exploitability of each given OpenBSD's attention to mitigation.
What you said, on other hand, was meaningless given that OpenBSD has had bugs that could lead to a crash. Real question is, "Do Xen or security-focused virtualization schemes (a) reduce number of vulnerabilities with impact of kernel-mode 0-days, and/or (b) prevent, contain, or facilitate easy recovery from OS- and app-level 0-days?" Prior experience in security-focused efforts show yes to both questions. Xen isn't one of them as the existence of the Xenon project shows. However, it's small size and improvements over time make it substantially less risky than an arbitrary OS + software combination esp if above layer is also addressed (eg MirageOS). Even Galois Inc.'s conservative teams are using it in some work.