Hacker News new | ask | show | jobs
by gnufx 2167 days ago
That's interesting, and I should have a closer look later, being long out of following GCC optimizations; I'm not good with assembler and never remember how the dumps work. The interaction between unroll-and-jam and unroll is confusing, in particular. (dot here is the unit-step loop, gfortran-10 gives a bit more info than -8, and more realistic avx targets are different, of course.)

  $ gfortran-10  -c dot.f90 -Ofast -fopt-info  
  dot.f90:3:7: optimized: loop vectorized using 16 byte vectors
  dot.f90:3:7: optimized: loop with 2 iterations completely unrolled (header execution count 64530389)

  $ gfortran-10  -c dot.f90 -Ofast -fopt-info  -funroll-loops --param max-unroll-times=4 -fvariable-expansion-in-unroller --param max-variable-expansions-in-unroller=4
  dot.f90:3:7: optimized: loop vectorized using 16 byte vectors
  dot.f90:3:7: optimized: loop with 2 iterations completely unrolled (header execution count 64530389)
  dot.f90:4:0: optimized: loop unrolled 3 times