|
|
|
|
|
by teo_zero
30 days ago
|
|
The author forgot to add "fused" here, like they did in other parts of the same section. Non-fused: foreach i
y[i] = cos(x[i])
foreach i
z[i] = cos(y[i])
Fused, no intermediate variable: foreach i
t = cos(x[i])
z[i] = cos(t)
The temporary "t" doesn't leave the GPU. Sweeping the array twice makes you twice as dependent on memory bandwidth. |
|