|
|
|
|
|
by mbitsnbites
882 days ago
|
|
BTW... Except for the indications from the die shots, one of the reasons that I don't think that uOPs can be as small as 32 bits is that studying fixed width ISAs and designing MRISC32 have made me appreciate the clever encoding tricks that go into fitting all instructions into 32 bits. Many of the encoding tricks require compiler heuristics, and you don't want to do that in hardware. E.g. consider the AArch64 encoding of immediate values for bitwise operations. Also, even if you manage to do efficient instruction encoding in hardware, you will probably end up in a situation where you need to add an advanced decoder after the uOP cache, which does not make much sense. The main thing that x86 has going for it in this regard is that most instructions use destructive operands, which probably saves a bunch of bits in the uOP encoding space. But still, it would make much more sense to use more than 32 bits per uOP. |
|
Keep in mind that the average RISC ISA uses 5 bit registers IDs and uses three-arg form for most instructions, that's 15 bits gone. While AMD64 uses 4 bit register IDs and uses two-arg form for most instructions, which is only 8 bits.
Also, the encoding scheme that Agner describes is not a fixed width encoding. It's variable width with 16bit, 32bit, 48bit and 64bit uops. There are also some (hopefully rare) uops which don't fit in the uop cache's encoding scheme (forcing a fallback to the instruction decoders). Those two relief valves allow such an encoding to avoid the needs for the complex encoding tricks of a proper fixed width encoding.
So I find the scheme to be plausible, though what you say about decoders after the uOP cache is true.