|
|
|
|
|
by inkyoto
781 days ago
|
|
> Not fully decoded though, since it's enough to look at the lower bits to determine instruction size. It is not about decoding, which happens later, it is about 32-bit instructions crossing the L1 cache line boundary in the L1-i cache which happens first. Instructions are fetched from the L1-i cache in bundles (i.e. cache lines), and the size of the bundle is fixed for a specific CPU model. In all RISC CPU's, the size of a cache line is a multiply of the instruction size (mostly 32 bits). The RISC-V C extension breaks the alignment, which incurs a performance penalty for high performance CPU implementations, but is less significant for smaller, low power implementations where performance is not a concern. If a 32-bit instruction cross the cache line boundary, another cache line must be fetched from the L1-i cache before an instruction can be decoded. The performance penalty in such a scenario is prohibitive for a very fast CPU core. P.S. Even worse if the instruction crosses a page boundary, and the page is not resident in memory. |
|
This does of course delay decoding this one instruction by a cycle, but you already have that for instructions which are fully in the next line anyways (and aligning branch targets at compile time improves both, even if just to a fixed 4 or 8 bytes).