Hacker News new | ask | show | jobs
by wvenable 1425 days ago
The article states that it's one cycle and slot per NOP.
1 comments

Modern x86 processors decode multiple instructions per clock. By “slots”, I’m assuming he means entries in the dispatcher or reservation stations. But NOPs don’t even make it to there. As I said, the decoder that encounters it will probably swallow it and emit nothing.

Besides, it sounds like premature optimization. This isn’t the 1980s; An extra clock cycle per function call is not going to make or break your program.

Modern.

There is a very good chance this dates back to 16-bit Windows. Even Windows 98 supported the 486 which was not capable of independent execution (that’s P5) or separate decode from execution (P5Pro there).

Those processors weren’t dead until Windows XP.

At the time this was relevant, it wouldn't have been premature optimization. Reducing that many cycles per function call would be a reasonable win.