| >> "several million" undocumented instructions.. is that right? Bear in mind that doesnt really mean that there are several million operations / opcode mnemonics which are undocumented but each distinct instructions. It is more likely they are "loose" decodings of other instructions, where changing a single bit of the opcode still causes the CPU to decode the same instruction. Toy example: If I encode my (imaginary ISA) 8bit instruction for "ADD EAX EBX" as 0101_X000 where X is "don't care" then regardless of whether the core gets 0101_0000 or 0101_1000 , it will still execute the ADD instruction. Now imagine your instructions can be upto 16 bytes long, and you see how loose decoding can lead to a lot of instructions which are undocumented, but that the processor is perfectly happy to execute. |
A CPU is, at root, a massive Boolean circuit wrapped in a flip-flop and some persisted state. The binary [sequence corresponding to an] opcode is just an input that determines which inputs go where.
Thus, for an n-bit opcode width, there are 2^n valid opcodes. Only m of them will correspond to intelligible, "I might want to use that some day" instructions. (Others, as you note, will be functionally equivalent versions of the m and ignorable as well.)
And so you will have 2^n - m "undocumented opcodes".
[1] based on reading NAND to Tetris