Hacker News new | ask | show | jobs
by kijiki 721 days ago
It decodes to uOPs optimized for the exact microarchitecture of that particular CPU. High performance ARM64 designs do the same.

But in the specific case of tracking variable length instruction boundaries, that happens in the L1i cache. uOP caches make decode bandwidth less critical, but it is still important enough to optimize.