Seriously though, I’ve been wondering for a while whether I could build a GCC for x86-64 that would have 32-bit (low 4G) pointers (and no REX prefixes) by default and full 64-bit ones with __far or something. (In this episode of Everything Old Is New Again: the Very Large Memory API[1] from Windows NT for Alpha.)
Unfortunately the obvious `__attribute__((mode(...)))` errors out if anything but the standard pointer-size mode (usually SI or DI) is passed.
Or you may be able to do it based on x32, since your far pointers are likely rare enough that you can do them manually. Especially in C++. I'm pretty sure you can just call "foreign" syscalls if you do it carefully.
Especially how you could increase the segment value by one or the offset by 16 and you would address the same memory location. Think of the possibilities!
And if you wanted more than 1MB you could just switch memory banks[1] to get access to a different part of memory. Later there was a newfangled alternative[2] where you called some interrupt to swap things around but it wasn't as cool. Though it did allow access to more memory so there was that.
Then virtual mode came along and it's all been downhill from there.
Schulman’s Unauthorized Windows 95 describes a particularly unhinged one: in the hypervisor of Windows/386 (and subsequently 386 Enhanced Mode in Windows 3.0 and 3.1, as well as the only available mode in 3.11, 95, 98, and Me), a driver could dynamically register upcalls for real-mode guests (within reason), all without either exerting control over the guest’s memory map or forcing the guest to do anything except a simple CALL to access it. The secret was that all the far addresses returned by the registration API referred to the exact same byte in memory, a protected-mode-only instruction whose attempted execution would trap into the hypervisor, and the trap handler would determine which upcall was meant by which of the redundant encodings was used.
And if that’s not unhinged enough for you: the boot code tried to locate the chosen instruction inside the firmware ROM, because that will have to be mapped into the guest memory map anyway. It did have a fallback if that did not work out, but it usually succeeded. This time, the secret (the knowledge of which will not make you happier, this is your final warning) is that the instruction chosen was ARPL, and the encoding of ARPL r/m16, AX starts with 63 hex, also known as the ASCII code of the lowercase letter C. The absolute madmen put the upcall entry point inside the BIOS copyright string.
(Incidentally, the ARPL instruction, “adjust requested privilege level”, is very specific to the 286’s weird don’t-call-it-capability-based segmented architecture... But it’s has a certain cunning to it, like CPU-enforced __user tagging of unprivileged addresses at runtime.)
To clarify: when I said that “the boot code tried to locate the chosen instruction inside the firmware ROM”, I literally meant that it looked through the entirety of the ROM BIOS memory range for a byte, any byte, with value 63 hex. There’s even a separate (I’d say prematurely factored out) routine for that, Locate_Byte_In_ROM. It just so happens that the byte in question is usually found inside the copyright string (what with the instruction being invalid and most of the rest of the exposed ROM presumably being valid code), but the code does not assume that.
If the search doesn’t succeed or if you’ve set SystemROMBreakPoint=off in the [386Enh] section of SYSTEM.INI[1] or run WIN /D:S, then the trap instruction will instead be placed in a hypervisor-provided area of RAM that’s shared among all guests, accepting the risk that a misbehaving guest will stomp over it and break everything (don’t know where it fits in the memory map).
As to the chances of failing, well, I suspect the original target was the c in “(c)”, but for example Schulman shows his system having the trap address point at “chnologies Ltd.”, presumably preceded by “Phoenix Te”. AMI and Award were both “Inc.”, so that would also work. Insyde wasn’t a thing yet; don’t know what happened on Compaq or IBM machines. One way or another, looks like a c could be found somewhere often enough that the Microsoft programmers were satisfied with the approach.
Somewhat related. At some point around 15 years ago I needed to work with large images in Java, and at least at the time the language used 32-bit integers for array sizes and indices. My image data was about 30 gigs in size, and despite having enough RAM and running a 64-bit OS and JVM I couldn't fit image data into s ingle array.
This multi-memory setup reminds me of my array juggling I had to do back then. While intellectually challenging it was not fun at all.
The problem with multi-memory (and why it hasn't seen much usage, despite having been supported in many runtimes for years) is that basically no language supports distinct memory spaces. You have to rewrite everything to use WASM intrinsics to work on a specific memory.
It looks like memories have to be declared up front, and the memcpy instruction takes the memories to copy between as numeric literals. So I guess you can't use it to allocate dynamic buffers. But maybe you could decide memory 0 = heap and memory 1 = pixel data or something like that?