|
|
|
|
|
by big-chungus4
19 days ago
|
|
How does x.cos().cos() work faster than doing two cos calls separately? Like the first cos call returns a tensor either way, the only difference is that it's not assigned to a variable. But how is it even possible know that difference in python? |
|
Non-fused:
Fused, no intermediate variable: The temporary "t" doesn't leave the GPU. Sweeping the array twice makes you twice as dependent on memory bandwidth.