Xen hypervisor memory corruption due to x86 emulator flaw

Y	Hacker News new \| ask \| show \| jobs

	Xen hypervisor memory corruption due to x86 emulator flaw (xenbits.xen.org)
	125 points by fwilhelm 4123 days ago

7 comments

kyboren 4123 days ago

This bug's existence and its patch have already made some worried. From the Qubes OS developers: [1]

    Additional thoughts by Qubes Security Team
    ===========================================
    
    We see several problems that concern us about this vulnerability and
    patching process:
    
    1) It seems really difficult to understand why would anybody design a
    structure like the one shown above, which uses a union to store two,
    RADICALLY DIFFERENTLY TRUSTED data: an internal pointer into
    hypervisor memory and VM-provided UNTRUSTED DATA? Such design decision
    made by one of the core hypervisor developer is certainly worrying.
    We're not sure if it would be more worrying if this was done purposely
    vs by carelessness...
    
    2) We are not entirely convinced if the way Xen Security Team decided
    to address this vulnerability is really optimal, security wise. It
    seems like a more defensive approach would be to get rid of this
    dangerous construct of reusing the same memory for both an internal
    pointer and VM-provided data. Apparently Xen developers believe that
    they can fully understand the code, with all its execution paths, for
    decoding x86 operands. This optimistic attitude seems surprising,
    given the very bug we're discussing today.
    
    3) This lack of defensive programing and perhaps over confidence (in
    ability to fully understand all the code paths) has been demonstrated
    by the Xen Security Team also previously. In the recently released XSA
    109 [2], the official patch also seemed to address the problem much
    earlier in the execution path rather than at the actual offending
    instructions, i.e. those that performed the NULL-dereference. While
    asked specifically about adding at least an additional check on these
    instructions, the Xen developers were unwilling to implement it
    implying potential performance impact.
    
    4) This is all certainly a bit disconcerting and we hope we could
    start a bit more public debate on these issues, especially among
    independent security researchers. We still believe Xen is currently
    the most secure hypervisor available, mostly because of its unique
    architecture features, that are lacking in any other product we are
    aware of.

[1]: https://raw.githubusercontent.com/QubesOS/qubes-secpack/mast...

link

zokier 4123 days ago

> We still believe Xen is currently the most secure hypervisor available, mostly because of its unique architecture features, that are lacking in any other product we are aware of.

Does anyone know why KVM would be considered less secure than Xen?

link

detaro 4123 days ago

(from memory, there are some design docs for Qubes OS floating around that discuss this) Xen is relatively small and contained, KVM sits on top of a full Linux kernel and potentially can access all of it, making it harder to tell what is accessible/exploitable and what is not. KVM also uses Qemu running as a process on the host linux for interfacing the VM, again exposing more potential attack surface. And I think Xen is better at isolating drivers, which for Qubes OS is a fundamental principle.

link

cthalupa 4122 days ago

>And I think Xen is better at isolating drivers

Xen allows for creating an entire stub domU solely for running the driver, then giving a running guest access via ring buffer in a shared memory segment.

(So, yep, you're correct in your thinking)

link

rwmj 4123 days ago

My guess would be a couple of things: small Xen hypervisor vs potentially large Linux kernel, and driver domains. The latter involves putting each driver into its own domain (ie. Xen VM or process equivalent) and it means that bad drivers can do less damage to the rest of the system.

link

zokier 4123 days ago

Sounds like Tanenbaum vs Torvalds redux..

link

weland 4123 days ago

de Raadt's remarks about virtualization do spring up in my mind.

link

fwilhelm 4123 days ago

I wrote a blogpost with some more details about the bug, which you can find here: http://www.insinuator.net/2015/03/xen-xsa-123/

link

cesarb 4123 days ago

Interesting blog posts (it and the preceding one). It seems that reliably emulating the x86 architecture is made even harder by a few features not found in other popular architectures, like an extra level of indirection on memory access (segment registers and the corresponding segment overrides) and most instructions having a memory-accessing variant (instead of limiting memory access to separate "load" and "store" instructions, plus a few specialized atomic RMW instructions).

link

mrmondo 4123 days ago

Interesting post, thanks. By the way your site seems very slow to load in Australia - it might benefit from use of a CDN if you don't already use one. (I usually recommend Cloudflare to people, their free service is great)

link

ambrop7 4123 days ago

Stupid question: Why does Xen need to emulate x86?

link

peterwwillis 4123 days ago

tl;dr, Windows. But also various limitations of how PV or HVM hosts work.

PVH fixes it so you don't need emulation in Linux, but it's a brand new feature and probably not production quality. Read this (it's not too technical) http://wiki.xen.org/wiki/Virtualization_Spectrum

As of this writing, Xen 4.4 and Linux 3.14 have experimental support for PVH DomUs and Xen 4.5 has support for PVH Dom0s. PVH allows practically native-hardware-speed guests without any emulation.

link

cthalupa 4122 days ago

Completely eliminating emulation isn't always desirable, unfortunately. The problem with PVH is the forcing the use of event channels to deliver interrupts.

Local APIC emulation is fully accelerated on modern processors, which is a big win for any use case that is heavy on interrupts.

link

amyjess 4123 days ago

Well, guess my Linode's getting rebooted again.

link

pilif 4123 days ago

As far as I understand, this is the public release of the issue that caused the bigger providers to force the reboots on their customers. This means that there won't be any more reboots (for these issues at least)

link

amyjess 4123 days ago

Ah OK, that makes sense.

link

Alupis 4123 days ago

> Well, guess my Linode's getting rebooted again.

Things happen. We should be glad that these sort of security problems are being found and addressed; it would be naive to believe Xen or any other large codebase has zero security problems.

VM's should be regularly patched anyway, which usually requires a reboot now and then. If it were a physical server, the same would be true; just because things are in the "cloud" now doesn't mean they will have infinite uptime.

I understand that sometimes these services reboot and patch with little notice, but this should be built into any Terms Of Service constructed with your client(s); ie. "We will patch and reboot your service for critical vulnerabilities as quickly as possible, which in some circumstances may leave only short notice."

link

drzaiusapelord 4123 days ago

Not to mention, my physical servers take ages to reboot considering all the BIOS and RAID checking they do. My Linodes, being VMs, literally boot in like 10 or 15 seconds. Maybe less. That's really minor downtime. My HP DL380s take several minutes.

You can't migrate to a new kernel on Linode without a reboot anyway, so if you're proud of a 12+ month uptime, you're running a vulernable kernel.

link

kbar13 4123 days ago

yep. Plus, if you're doing vms correctly, you should be able to spin up new ones quickly to cover ones that go down.

link

tedunangst 4123 days ago

Depends. My physical server isn't shared with anyone. Most local exploits are not a particular worry. A security vuln, almost by definition, requires a shared resource. No sharing, no caring.

link

Alupis 4123 days ago

That would depend on how "local" a "local exploit" is. If they require physical access to your system, well, then that's one thing. But if "local" is to mean on the network, that's far simpler for an attacker to pull off.

> A security vuln, almost by definition, requires a shared resource. No sharing, no caring

You would only not care if your servers have zero access to the internet and are air-gaped from the rest of your network (even then it's been proven some vulnerabilities can be exploited to gain access).

link

tedunangst 4123 days ago

"local" is generally understood to mean "not the network".

link

josh2600 4123 days ago

If you're really worried about having to deal with reboots, you can run Terminal on top of Linode and gain the ability to live-migrate all of your workloads (so you never have to take down your application because of the underlying metal rebooting).

I discussed it in detail previously on HN [0], but we give you the ability to live-migrate your workloads, even onto heterogeneous kernels. If that's something you really need, you can get it from Terminal today.

[0] https://news.ycombinator.com/item?id=9120289

link

amyjess 4123 days ago

Meh, it was just a snarky comment. I don't really care; I'm not doing much with my Linode anyway. I'm mostly just using it as a shellbox/IRC client/Mercurial backup.

link

peterwwillis 4123 days ago

If they have one spare xen host they can live migrate all guests from one host to the spare, patch the original host, reboot, then live migrate the spare's guests back to the original, and repeat. Patching them all and rebooting them all at once might be quicker though.

link

mikeash 4123 days ago

Do you need to migrate them back? Would be faster to just use the newly patched original host as the new spare, and repeat like that.

link

jroll 4123 days ago

This becomes a physics problem; it's a race between how quickly VMs can be migrated and when the embargo is lifted.

link

avsm 4123 days ago

The XenServer toolstack supports exactly this mode of operation in a pool of physical hosts via a 'host evacuate' operation that live relocates VMs away and brings them back once the host upgrade is complete.

http://docs.vmd.citrix.com/XenServer/5.0.0/1.0/en_gb/install...

link

mukyu 4123 days ago

"Non-maskable interrupts triggerable by guests " http://xenbits.xen.org/xsa/advisory-120.html

"Non-standard PCI device functionality may render pass-through insecure " http://xenbits.xen.org/xsa/advisory-124.html

the others also released today

link

avsm 4123 days ago

It's worth noting that most of these bugs are right in the innards of the x86 emulation. Xen/ARM is a breath of fresh air, since they took the decision to only support the new ARMv7 virtualization extensions. This eliminate the need for qemu running in dom0 per VM and the instruction emulation plumbing.

We've got a distribution of Xen 4.4/ARMv7/ubuntu for anyone curious to try it out on a cheapo Cubieboard2 or Cubietruck over at https://github.com/mirage/xen-arm-builder (with prebuilt SDcard images at http://blobs.openmirage.org)

link

Thaxll 4123 days ago

Why is that Xen seems to have so many security issues compare to KVM?

link

mentat 4123 days ago

More production use means more attention. Plenty of security issues everywhere.

link

Someone1234 4123 days ago

In particular when you're developing in languages which are insecure by design.

link

Alupis 4123 days ago

Insecure code can be written in any language.

link

Someone1234 4123 days ago

Just like any car can crash. However some cars are more dangerous than other cars, just as some programming languages are more likely to produce insecure code than other programming languages.

Nobody is proposing re-writing a hypervisor in Java or Python, but C/C++ isn't the only game in town anymore for unmanaged code, and the alternatives are designed from the ground up with security in mind.

link

Alupis 4123 days ago

> Just like any car can crash. However some cars are more dangerous than other cars, just as some programming languages are more likely to produce insecure code than other programming languages.

> and the alternatives are designed from the ground up with security in mind

The RMS Titanic was billed as one of the safest ships on the sea -- yet due to poorly implemented protocols and practices, negligent leadership, and disregard for best practices, it resulted in one of the most catastrophic maritime disasters.

Using the most "secure" programming language in the world, one can still design very insecure code. Conversely, using the most "insecure" programming language in the world, one can still design very secure code. This would boil down to the skill of the engineers, competence of leadership and adherence to best practices.

link