|
|
|
|
|
by celrod
2178 days ago
|
|
For gcc you need `-funroll-loops` to unroll and `-fvariable-expansion-in-unroller` to get multiple accumulation vectors. By default, it'll only use 2 accumulation vectors. You can set it to 4 (for example) with `--param max-variable-expansions-in-unroller=4`. `-funroll-loops` by default unrolls 8 times. Unrolling beyond the number of accumulators is wasteful for simple operations like dot procuts or summations, so you may want to control that with `--param -max-unroll-times=4`. Thus, the following works and will produce generally faster code than LLVM (because LLVM doesn't vectorize the remainder, giving you potentially large numbers of scalar operations): `-Ofast -funroll-loops --param max-unroll-times=4 -fvariable-expansion-in-unroller --param max-variable-expansions-in-unroller=4` Godbolt: https://godbolt.org/z/4PXSqs |
|