| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by layer8 1426 days ago
	The article explains why a MOV is used instead of two NOPs. Five NOPs would obviously be even worse.

2 comments

colejohnson66 1426 days ago

But NOPs aren’t even executed. They’re swallowed by either the decoder or dispatcher.

link

avianlyric 1426 days ago

Because you can’t atomically replace the NOPs. So there’s nothing to prevent you from inserting your patch while a thread partway through consuming the NOPs, resulting in a portion of your patch being decoded out of order.

link

wvenable 1426 days ago

The article states that it's one cycle and slot per NOP.

link

colejohnson66 1426 days ago

Modern x86 processors decode multiple instructions per clock. By “slots”, I’m assuming he means entries in the dispatcher or reservation stations. But NOPs don’t even make it to there. As I said, the decoder that encounters it will probably swallow it and emit nothing.

Besides, it sounds like premature optimization. This isn’t the 1980s; An extra clock cycle per function call is not going to make or break your program.

link

MBCook 1426 days ago

Modern.

There is a very good chance this dates back to 16-bit Windows. Even Windows 98 supported the 486 which was not capable of independent execution (that’s P5) or separate decode from execution (P5Pro there).

Those processors weren’t dead until Windows XP.

link

wvenable 1426 days ago

At the time this was relevant, it wouldn't have been premature optimization. Reducing that many cycles per function call would be a reasonable win.

link

mFixman 1426 days ago

But isn't there a 5-byte single instruction that has no effect, like `NOP DWORD ptr [EAX + EAX*1 + 00H]`?

I thought that multibyte NOPs were executed in a single instruction?

link

MBCook 1426 days ago

They may not have been coalesced at the time the decision was made.

link

layer8 1426 days ago

I’m pretty sure it would be slower, if only by taking up more space in the instruction cache (in the common case where no hotpatch is applied).

link