|
|
|
|
|
by rob74
2 days ago
|
|
Ok, then it will be an explosion of binary size, if you have several code blocks optimized for each architecture level - I'm not very familiar with the subject, but I imagine it would have to be relatively large chunks of code, otherwise the constant branching would eat up the speed advantage. |
|
An unspecialised popcnt is half the dozen instructions, for specialised versions it’s 4 implementations ranging from half a dozen to two dozen bytes.