Hacker News new | ask | show | jobs
by teo_zero 30 days ago
The author forgot to add "fused" here, like they did in other parts of the same section.

Non-fused:

  foreach i
    y[i] = cos(x[i])
  foreach i
    z[i] = cos(y[i])
Fused, no intermediate variable:

  foreach i
    t = cos(x[i])
    z[i] = cos(t)
The temporary "t" doesn't leave the GPU. Sweeping the array twice makes you twice as dependent on memory bandwidth.