Interesting question. I think most optimizations described in the BOLT paper are fairly hardware agnostic - branch prediction does not depend the architecture, etc. But I'm not an expert on microarchitectures.
A lot of the benefits of BOLT come from fixing the block layout so that taken branches go backward and untaken branches go forward. This is CPU neutral.