Wow thanks for the pointer! That seems unfortunate, I wonder if there is a way to evaluate whether the extra jump is actually worth it, and whether this optimization could be allowed.
If I remove the check for `OptForSize` and `PredTBB == MBB`, we can optimize OP's report, but seem to regress basic tests like llvm/test/CodeGen/X86/conditional-tailcall.ll, literally flipping the branches incorrectly IIUC.
Why is that a problem? I'd figure that a short jump is basically free since the target is likely to be in the instruction cache? Is it an issue of readahead/speculation through multiple jumps?