Hacker News new | ask | show | jobs
by filam 2791 days ago
I have to admit I stopped reading after the combined section on byte swapping and casting external data to a internal structure. Perhaps the author of the code should have used ntohs() instead of be16toh(), but the effect is the same? There aren't any conditionals in ntoh...

https://commandcenter.blogspot.com/2012/04/byte-order-fallac...

And will a compiler make use of native byte swapping instructions with the code proposed by the author?

5 comments

The argument against byte swapping suggests that you shouldn't take a value of a numeric type and apply some (maybe no-op) permutation to it to produce another value of the numeric type. Instead, you should only ever think about byte order at the step where you have an array (or stream, or...) of individual bytes and are converting it to a value of a numeric type (larger than uint8).

I think this is also the main point of the post you linked: the "bad" example has a numeric value and then performs byte swapping on it, the "good" examples only ever construct "correctly ordered" numeric values.

If you can possibly get to a place where you have a uint16 but the value is "in the wrong byte order", you've done something wrong already.

Admitting the possibility of values with the wrong byte order into your system casts every single value into doubt. Of course it's possible to resolve that doubt on a case-by-case basis by carefully identifying the right places to insert ntohs()/htons() calls, but to me that sounds like creating extra chores for yourself while working against the language.

This philosophy is similar to Python 3's hard separation of bytes and strings. Bytes are raw data you get off the wire and which are encoded in a certain way. Once decoded into a string, you have a clean, consistent internal representation that behaves as you expect a string would. If you're juggling (or even thinking about) byte encodings past the parsing stage and before the output stage, you've done something wrong.
Yeah, it generalizes to an argument for using types to encode assumptions about data, so you have static certainty/remove dynamic uncertainty. I'm strongly in favor of all arguments that let me remove sources of uncertainty from my mental model, since enough stuff I'm uncertain about always remains.
Yes, agreed. It is a step backward to manipulate and recombine bytes. "Extracting an integer from a packet, then changing it's internal format" is exactly what is being done, and thinking at that abstraction level is beter than playing with individual bytes. And suppose you hoist those operations into a separate function instead of doing it for every field? Well, congratulations on reimplementing ntohX. If the boundary between internal and external data is not clear, that's a bigger issue that could cause incorrect or unnecessary data transformations, whether we're talking about byte order, string sanitizing, unit conversions, etc.
The problem is that the higher-level abstraction is broken. Specifically, the abstraction provided here is "extract an integer of unspecified size and endianness from a byte stream" (note that the size of short isn't guaranteed to be anything specifically, other than 2 <= sizeof(short) <= sizeof(int)). And ntohs is another abstraction which amounts to, "make the endianness correct for the current system, assuming that it was originally big-endian". Not that it doesn't work - it's just not well-designed, and too easy to use incorrectly.

A proper abstraction is the one where you specify both the size and the endianness at the point where you extract the value - i.e. where you just call a function like read_int16_be(byte_stream).

> (note that the size of short isn't guaranteed to be anything specifically, other than 2 <= sizeof(short) <= sizeof(int)).

That isn't guaranteed. sizeof(short) and sizeof(int) are both allowed to be 1 on a system with chars at least 16 or 32 bits wide, respectively.

Yep, you're right.

Which just goes to show that these are not the types you want to be using for any kind of portable binary I/O.

Who the actual fuck cares about native byte swapping instructions. We're talking about DHCP code, there is no computer in the universe on which parsing of DHCP packets is even noticeable as a blip on the performance chart.
We're talking about safe ways to write parsers for byte streams. The original article used systemd's DHCP bug as a motivating example, but none of this is specific to DHCP.
The blogpost you link seems to basically recommend the same as the author, except they only talk about the conversion step and not about where it should take place too?
That post by Rob Pike is arguing that you should not need to check the endianness of your system when swapping. Not that you should never byte swap. Systems and protocols have different endianness, which needs to be handled by programs where endianness has not been abstracted away.
which is the same the submission says? Extract the data using a function that produces a value "mathematically" that's always correct for the host machine. His example uses multiplication instead of shifts like Pike, but that's a minor stylistic difference.
Rob Pike is saying that your code should never care/check what the host byte order is; if you use the libc FOOtoh() functions, you aren't caring what the host byte order is, you're just converting something to it.

Put another way, Pike is saying that libc should have implemented be16toh() like:

    # define be16toh(x) ( (((char*)x)[1]<<0) | (((char*)x)[0]<<8) )
instead of how GNU libc implemented it:

    # if __BYTE_ORDER == __LITTLE_ENDIAN
    #  define be16toh(x) __bswap_16 (x)
    # else
    #  define be16toh(x) __uint16_identity (x)
    # endif
But that complaint doesn't affect programs that call be16toh().
The issue, I think, is when the swapping is done. If you cast raw bytes into a structure you can forget to swap. If you read fields individually with a helper functions that swap when needed, you can't.

Whether there's a endianness conditional anywhere there or not is a bit of pedantry, I think.