Hacker News new | ask | show | jobs
by throwawaylinux 1609 days ago
It doesn't, if your L1$ predecodes at fill time and stores instruction length.

Something, somewhere does have to do a serial length decoding of course. But when you look at the L2 access latency and throughput (which is the minimum L1 fill latency), it's clear you could afford to do that part of the decode over more cycles.

New designs are not just predecoding lengths but entire uops now into the first level instruction cache which is the same concept they just call it a L0 and L1 rather than L1 and L2.