Does the microcode give any hints on why the general PUSH and POP are in completely different places in the opcode map (push is FF/6, pop is in its own group in 8F/0 with 8F/1-7 invalid, while FF/7 is unused)? It almost looks like FF/7 was supposed to be the pop. I've always wondered what 8F/1-7 and FF/7 do on an 8086/8 too, but it's very hard to find that information.
What's "random logic"? From context, it sounds like circuitry that explicitly implements the functionality of an opcode, as opposed to circuitry that can be used by the microcode, or something?
To expand on that, "random logic" means that it looks random; it's not actually random. This is in contrast to circuits that have an underlying structure to them, like a PLA or ROM.
> While most of the unused parts of the ROM (64 instructions) are filled with zeroes, there are a few parts which aren't. The following instructions appear right at the end of the ROM [...]
Given that they're right at the end — and seemingly intentionally written there after the rest of the unused space before them was zeroed — might those bytes be a checksum of the ROM?
I don't think there's anything on the chip that could compute a checksum of the microcode ROM contents. It could be some kind of copyright message perhaps, though I don't know how it's encoded and it's only 42 bits long so there isn't much space for anything meaningful.
I would guess that it’s not a runtime-verified checksum, but rather a simple embedded “sum complement” value, used for ROM-mastering-time integrity verification.
A sum-complement value is a value computed from some data, such that, when the data is checksummed with the sum-complement value now embedded into it, the data will sum to zero. This approach to checksumming is useful, as any potential verifier just has to throw the image-as-a-whole through the checksumming algorithm, and ensure that the output is zero. It doesn’t need one iota of knowledge about what it’s verifying. It doesn’t even need an extra machine-register to hold the expected checksum.
These “blind” checksums allow ROM production hardware (programmers, copiers) to both pre-verify the integrity of the input image, and to post-verify that it has programmed the image onto a chip successfully. No special container format for the ROM image is required, nor is the ROM image required to be structured in any particular way (which is good, because ROMs are used for all sorts of things, not just code.) The ROM image can be any opaque blob, just as long as it sums to zero.
In fact, you don’t even need a ROM “image” at all. It’s possible to integrity-verify a programmed ROM “against itself”; and thus, a hand-programmed ROM (e.g. an EEPROM you programmed in your office) can be sent to the duplication facility to serve as the reference from which mask-ROM masks will be generated. The data on the EEPROM can be trusted, because it sums to zero. And the mask ROMs themselves can be checked for flaws by seeing whether they sum to zero.
For smaller-scale ROM distribution, ROM-to-PROM bulk copiers are used. These copiers can be made to both pre-verify the source, and to post-verify the programmed copies. Using this approach to checksumming, the copier can avoid having to verify the source “against” the destination, instead only needing to verify the source once, and then verify the destinations against themselves. This both speeds up verification; and allows for the use of simpler microcontrollers in these copiers, which reduces their design cost. (By quite a lot, back in the 1970s, when all this was most relevant.)
You can see this approach to checksumming in practice in early-generation game cartridge ROMs, which almost always have these embedded sum-complement values (and so presumably were integrity-verified during mastering/duplication.) These sum-complement value fields get referred to by emulators as “the checksum” of the ROM image—but technically, they’re not; if you’re following along, you’ll realize that “the checksum” of such ROM images is zero! :)
I was being kind of loose with terminology; technically, a “ROM image” is an image (i.e. a replica, like a disk image) of a ROM chip.
ROM is random-access for reads—it’s “memory” in the same sense that RAM is memory, wiring onto a device’s address bus and so becoming part of that device’s physical memory layout.
So when people say that a game-cartridge backup device or the like captures a “ROM image”, what they really mean is that it captures “a snapshot of what the mapped region of the address space that the ROM chip claims to map for — or seems to be wired to — looks like.” Sometimes there’s metadata in the ROM itself saying what region the ROM maps for. But since the ROM is just a physical chip sitting on the bus, it can map or not map for any address arbitrarily (as long as it has the correct address lines wired to discriminate that address from other addresses.)
This is what results in so-called “overdumps” — this is where a ROM chip doesn’t actually respond to all the read requests that its mapping claims it does, and thus, for some reads (usually the ones at the top end of the ROM’s address space) you don’t get a response from the ROM, leaving the data bus floating (“open bus”), giving you undefined data for those reads.
This is why I say that a ROM image is technically an image of the address space a ROM occupies as discovered by requesting those addresses, and not an image of the ROM’s contents per se: most ROM images are, in fact, overdumps. It’s just that more modern systems have pull-up resistors on the data bus to ensure that reads the ROM doesn’t deign to respond to, read off as zero.
ROM copiers are really “ROM image” copiers — they work by programming the destination ROM(s) with the data discovered by probing the source ROM’s address space, as above. If the destination ROM is larger than the source ROM, the destination ROM will record an overdump of the source ROM.
All that being said, when originally programming an EEPROM, the ROM-programming device doesn’t actually interface to your computer as writable random-access memory. It interfaces as, essentially, a hybrid serial/block device — i.e. a device where you can either write (program) one byte to an arbitrary address, or write (program) a whole ROM-block (usually 64 bytes) at a time. You can also erase an entire block.
In other words, functionally, an EEPROM accessed through a programming device acts very similarly to flash memory accessed through a flash controller. (Flash memory is, in essence, an EEPROM technology with very fast writes trading off against slower, block-at-a-time reads rather than bus-speed byte-at-a-time reads.)
What that means, in practice, is that there’s no particular constraint on how you first program the data into the EEPROM you’re going to be mastering PROMs with. There’s no “ROM programmer file format”, any more than there’s a common file format used to descriptively represent the instructions the various mkfs(8) utils use to initialize filesystems onto a block device. Programming EEPROMs is a procedure, not data per se.
That being said, if we wanted to represent the process of programming an EEPROM using modern file formats, a CUE sheet (or equivalent) would probably be the best approach. A CUE sheet isn’t a description of the intended result, but rather a sequence of instructions for an abstract “burner” to go through to produce a result. Unlike a ROM image, which just tells you what you got when you tried to read from the addresses in an assumed-mapped memory region, a CUE sheet tells you what some other device originally tried to put at those addresses, and so lets you figure out which reads are “true” answers from the ROM, vs “open bus” answers, vs. de-facto responses from a pull-up resistor. (It also lets you emulate the process of cell wear, and so figure out which cells were intentionally “programmed to death”, allowing a faithful representation of “indeterminate state” addresses, much like the Applesauce image format[1] does for magnetic-flux media.)
So, to be clear, there's no defined file format for ROMs generally. You know the size of the EEPROM chip sitting in the programmer; you have some data you'd like to write (maybe in a file; maybe as a stream); as long as the size of the data is less than the size of the chip, you can just dd(1) the data, blockwise, onto the programmer block-device, and you'll get a programmed EEPROM.
But if you want to make this friendly to consumers — say, if the EEPROM is your computer's BIOS ROM — then you take a ROM image you've constructed some other way; wrap it in your own format with checksums et al; create a "flasher" program that first verifies the integrity of the ROM image against the checksum, and then dd(1)s it to the EEPROM programmer block-device. Usually the file extension OEMs decided on for these ROM-in-container files was ".bin". Doesn't mean anything; they were arbitrary formats, or sometimes not formats at all, just raw ROM images.
Thanks for the wonderfully detailed reply. I had a follow up question does the ROM designer or any part of the ROM itself ever have to know where in memory it is mapped to?
According to https://en.wikipedia.org/wiki/Intel_8086: "The architecture was defined by Stephen P. Morse with some help and assistance by Bruce Ravenel (the architect of the 8087) in refining the final revisions. Logic designer Jim McKevitt and John Bayliss were the lead engineers of the hardware-level development team and Bill Pohlman the manager for the project." I expect the microcode was developed in tandem with the rest of the chip, so probably took about 2 years.