Hacker News new | ask | show | jobs
by atiedebee 68 days ago
I made a program with some inline assembly and tried O3 with clang once. Because the assembly was in a loop, the compiler probably didn't have enough information on the actual code and decided to fully unroll all 16 iterations, making performance drop by 25% because the cache locality was completely destroyed. What I'm trying to say, is that loop unrolling is definitely not a guarantee for faster code in exchange for binary size
1 comments

Large blocks of inline assembly also destroy -O3. The compiler treats the asm statement as being essentially empty and makes decisions around it. Most inline asm is 1 instruction, so this is usually safe.