Hacker News new | ask | show | jobs
by nkurz 4706 days ago
Thanks for looking at this! This is wonderful and succinct (presuming it works, I'm still staring at it), but the 20+ latency of DIV makes it impractical. I think I can do it with multiply, and, multiply, shift, but even 3 cycle latency for the multiplies is too much:

  key *= 0x04040400UL;
  key &= 0x30C30C00UL;
  key *= 0x00041041UL;
  key >>= 28;
Let me stare at your shuffling solution for a bit...
1 comments

In the interest of achieving the simplest instruction sequence, I've also found (AVX2):

  key   = _pdep_u32(key, 0x03030303);
  key  *= 0x01010101;
  key >>= 24;