|
|
|
|
|
by phdelightful
456 days ago
|
|
I compiled it for Ampere and counted 6834 actual F32 operations in the SASS after optimizations. I only counted FFMA, FADD, FMUL, FMNMX, and MUFU.RSQ after eyeballing the SASS code, so there might even be more. It's possible the FMNMX doesn't actually take a FLOP since you can do f32 max as an integer operation, and perhaps MUFU.RSQ doesn't either, but even if you only count FFMA, FADD, and FMUL there are still 3685 ops. nvcc -arch=sm_86 prospero.cu -o prospero
cuobjdump -sass prospero | grep -E 'FFMA|FADD|FMUL|FMNMX|MUFU\.RSQ' | wc -l
|
|