Hacker News new | ask | show | jobs
by duckingtest 4295 days ago
>In general, you might be better off always using long unless there is a specific reason not to

I don't agree. That's trading off useful cache for almost invisible micro optimizations.

As a general rule, it's always better to use the smallest size possible.

1 comments

I phrase it as "might be" since I realize it's controversial, but I think that rule is outdated. Yes, having a large static array would be a fine specific reason to use the smallest size possible. But what would be the benefit when you are dealing with the argument to a function in an example like this?

Most of the time the argument starts in a register, is passed in a register, and returned in a register. Using a smaller size often just means that the compiler adds some unneeded conversions as in this case. Usually this doesn't matter, but when it does, the benefit is almost always in favor of the simpler rule of always using 64-bit variables.

One argument? Sure, probably no difference. When you start using longs as local variables and arguments, even if they're all in registers in one function, if you call other functions inside they are going to be pushed on the stack. It all adds up and suddenly you're getting L1 cache misses.

Anyway, unnecessary conversions mostly go away when you use link time optimizations (fwhole-program or flto in gcc).

A reasonable argument, although not one I seem to run up against, perhaps because I'm rarely concerned about high performance when writing functions with that many layers of subcalls.

By contrast, when trying to optimize inner loops, I frequently encounter cases where the front-end limitation of 4 micro-ops per cycle is a limiting factor, and getting rid of any extraneous instruction is a speedup. And rather than worrying about a deep stack causing L1 data misses, I'm more concerned with missing L1 instruction cache, or with the extra micro-ops causing me to miss the ~1000 slot decoded micro-op cache.

These concerns are clearly at opposite ends of the performance spectrum, and which should dominate probably depends on the problem at hand.

(I glanced at your comment history. Welcome to HN! You have good insights. Please stick around.)