|
|
|
|
|
by ArchD
1284 days ago
|
|
In this modified version where the for-loops get converted to do-loops (possibly unsafe/incorrect behavior), unnecessary multiple vpbroadcastd's for ymm1 are still done: https://godbolt.org/z/61jYejsra With the original, for-loop version, it could be argued that in case no loop iteration gets run at all, the vpbroadcastd's can be totally skipped, and to generate extra code for different cases of whether each loop is empty to avoid unnecessary vpbroadcastd's is not worth it (e.g. the greater icache pressure resulting from longer code). With the do-loop variant, both loops will get at least one iteration so the compiler really could just unconditionally do the vpbroadcastd once before the first loop. It somehow fails to realize that ymm1 did not get clobbered between the two vpbroadcastd's. |
|