|
|
|
|
|
by camel-cdr
781 days ago
|
|
> me but cannot tell the boundary of instructions until they are all decoded Not fully decoded though, since it's enough to look at the lower bits to determine instruction size. > Sure, it can be done, but that doesn't mean that it doesn't cost you in mispredict penalties What does decoding have to do with mispredict penalties? > Example: branch and jump offsets are painfully small Yes, thats what the 48 bit instruction encoding is for.
See e.g. what the scalar eficiency SIG is currently working on: https://docs.google.com/spreadsheets/u/0/d/1dQYU7QQ-SnIoXp9v... |
|
It is not about decoding, which happens later, it is about 32-bit instructions crossing the L1 cache line boundary in the L1-i cache which happens first.
Instructions are fetched from the L1-i cache in bundles (i.e. cache lines), and the size of the bundle is fixed for a specific CPU model. In all RISC CPU's, the size of a cache line is a multiply of the instruction size (mostly 32 bits). The RISC-V C extension breaks the alignment, which incurs a performance penalty for high performance CPU implementations, but is less significant for smaller, low power implementations where performance is not a concern.
If a 32-bit instruction cross the cache line boundary, another cache line must be fetched from the L1-i cache before an instruction can be decoded. The performance penalty in such a scenario is prohibitive for a very fast CPU core.
P.S. Even worse if the instruction crosses a page boundary, and the page is not resident in memory.