| a quick bird's eye view summary to get your bearings: (caveat, I'm taking shortcuts so things may not survive scrutiny but it should be good enough for an overall understanding and getting down to the truth if you so desire) emulation: software that implements and simulate hardware to mimic it as expected by software that would run within emulation this is slow, because you have to really pretend hardware down to the details (simulate clocks, irqs, video output, etc). often you can get away with not being 100% exact and accurate depending on the emulated software's assumptions or you don't care if some specific output is incorrect (e.g graphical difference or oddity in games), which gives you some nice speedups. 100% accurate emulators can and do exist but you can see the difference in requirements (see mesen, higan) with inaccurate emulators. some people quickly figured out that if you are running software compiled for a specific target (e.g a x86_64 CPU) that is the same target as the host then it kinda doesn't make sense to emulate the CPU... and that you can kinda obviously pass through cpu instructions instead of simulating the very hardware you're running on... this is: virtualization: instead of emulating, expose a non-emulated device from the host that the software is known or expected to be compatible with, thus saving a lot of cycles. for the CPUs this is typically done through an instruction that creates an isolated context in which the software will run. reminder: a CPU has multiple "rings" already under which software is run, for security and stability reasons (memory protection, etc...), e.g the kernel runs under ring 0 and userland processes are under ring 1, each in a different context preventing one to access the other in the same ring, and to go up rings, all checked at the hardware level. the virtualization instructions roughly do the same thing and nest things so that a virtual cpu runs code in a separate ring+context, in a way that the guest OS thinks it's alone. see also type 1 (xen, hyper-v) vs type 2 (qemu, vmware, parallels) hypervisors. other devices can be implemented in lightweight, efficient ways, and exposed as virtual devices to the virtual machine for the nested OS to pick up. e.g instead of emulating a full PCI network card complete with pseudophysical hardware matching a real one, one can have a simpler device exposed that merely takes ethernet packets and inject them into the host kernel's network stack. or same for IO, instead of emulating a sata device complete with commands and all, one could just expose a block device and ask the host kernel to read its data from some host file directly. this requires dedicated drivers for these virtual devices in the guest OS though. virtualization is as good as native because often it's literally native, only with hardware sandboxes and maybe a few added indirect wrappers, that Windows on PCs and apps and games on Xboxes actually run in VMs under HyperV all the time! so back to the CPU side of things, you can notably see how it is impossible with virtualisation for a CPU to execute anything else than its own instruction set. now, executing foreign instructions more efficiently than literally interpreting them can be achieved through various means such as binary translation or dynamic recompilation, but it still has to take some guest assumptions into account, e.g Rosetta 2 (which is neither virtualisation nor emulation) translates x64 code to arm64 but x64 assumes a certain ordering of operations between cpu and memory that just cannot work as is on arm64, so there's a special Apple-specific instruction to switch a CPU thread to a different memory model, still running arm64 instructions (translated from x64) but in a way that matches the x64 behaviour. there are many more gritty details, and things are not always as clear cut as I described (or flat out lied about, the Feynman way) but it should be a good enough model to start with. |