Hacker News new | ask | show | jobs
by robomartin 4707 days ago
If you've ever dealt with graphics file manipulation code chances are you've suffered the pain of changing the endian-ness of an image file. I never understood why some of these operations are not implemented as machine instructions that can run in one instruction cycle flat. There's nothing to them, I've done exactly that on FPGA's. Yes, they can be a little resource/routing intensive but not that bad.
3 comments

> changing the endian-ness

x86 has had the BSWAP instruction since the 486.

gcc has a __builtin_bswap16, __builtin_bswap32, and __builtin_bswap64 which will presumably take advantage of these built-in instructions on x86 and any other gcc-supported architectures where similar instructions exist (and fall back to a reasonably fast and well-tested multi-instruction implementation where they don't).

You should really RTFM every couple years, just to know what your processor [1] and compiler [2] can do.

[1] http://www.intel.com/content/www/us/en/processors/architectu...

[2] http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

Oh, I RTFM. Not always working on Intel platforms though. And still:

http://hardwarebug.org/2010/01/14/beware-the-builtins/

Note that this post is about changing the order of bits within a byte (eg. x86-64 XOP instruction VPPERM with bit reversal option). Whereas you are talking about changing the order of bytes within a word (x86 instruction BSWAP).
Wait, why is it resource intensive? If all you need to do is reverse a fixed-size integer, wouldn't you just wire the inputs to the outputs backwards?
Depending on timing requirements, device type, operating speed and word width you have to add one or more layers of flip-flops to facilitate timing closure and avoid potential metastability issues.
Right, but that's true of all CPU instructions. If you already have an ALU capable of doing things like integer multiplication, would adding what is essentially a bunch of chained flip-flops really going to add much more complexity or resource usage?
I was mostly talking about FPGA's. I don't know the criteria designers use when making decisions about what to add (or not) to a CPU design.