|
|
|
|
|
by kaiwetzel
5194 days ago
|
|
The instruction set page linked in the article mentions: To move a double quadword to or from memory locations that are known to be aligned on 16-byte boundaries, use the MOVDQA instruction - would it not be beneficial to use this instruction for most of the memory to be copied and only use the slower variants for the leading and trailing bytes? Does the C semantics prevent the compiler from issuing a run-time check for the (rare) aliasing case and proceed with the fast version in the common case? (probably unrolled, too?) Out of curiosity: The code uses a special counter variable count which has to be decremented separately - is this faster than testing for dst != behind_last_dst_prt ? (aside: my gut-feeling tells me that if the combination of C/8086 can't pull his example of at the maximum memory to processor transfer speed for sufficiently large input vectors, there is something seriously rotten ...) |
|