| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by infberg 1712 days ago
	Care to explain? -O3 generates larger code than -O2?

1 comments

pclmulqdq 1712 days ago

Yes, -O3 tends to include a lot of features that increase code size, like aggressive loop unrolling. If you are jumping around a large amount of code, -O3 generally performs more poorly than -O2, but if you are running a tight loop (like HPC code), -O3 is better.

In the past, at a time when I worked on a very performance sensitive codebase that was also limited in scope, we compiled with -Osize and did all the loop optimizations we wanted manually (and with pragmas). That produced faster code than -O2 or -O3.

link

gnufx 1712 days ago

Regarding unrolling, -O3 contains -funroll-and-jam but not -funroll-loops. You may want one or the other, maybe both, depending on circumstances. I don't see much benefit from the available pragmas on HPC-type code unless for OpenMP, and "omp simd" isn't necessary to get vectorization in the places I've seen people say it is. Mileage always varies somewhat, of course. (Before second-guessing anything, use -fopt-info.)

link

boibombeiro 1712 days ago

Modern x86 CPUs have micro instr caches to store small loops (about 50 instr) and medium loops (~2k instr). Also, the bottleneck is usually the instruction decoding (Alder Lake made huge changes on that, so this might change).

In other words, loop unrolling is, more often than not, harmful.

link

jleahy 1712 days ago

It’s a shame that Osize can sometimes produce truly awful code. There are a few optimisations in there that trade a byte for a massive slowdown.

link

userbinator 1712 days ago

You asked for minimum size, and that's what you got. I'd say that's working as it should.

A more granular control over optimisation would be good, however.

link

jleahy 1712 days ago

Probably just some tweaks to O2 would be enough, after all people are selecting Os over O2 because they see better performance, and that should not be happening.

link

kevin_thibedeau 1712 days ago

You can enable/disable individual optimizations. How much more granular do you need?

link

jhgb 1712 days ago

Surely a profile-guided build should be able to only apply -Os to those functions where it doesn't cause a lot of problems.

link

pclmulqdq 1712 days ago

In the application I referred to, PGO was also used. However, that only applies -Os to cold code, and if what you're doing is very branchy, it can help even in the hot path.

link