|
|
|
|
|
by jra101
1336 days ago
|
|
Have you tried reducing the register count in your FP32 FMA test by increasing the iteration count and reducing the number of values computed per loop? Instead of computing 8 independent values, compute one with 8x more iterations: for (int i = 0; i < count * 8; i++) {
v0 += acc * v0;
}
That plus inlining the iteration count so the compiler can unroll the loop might help get closer to SOL. |
|