| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Unklejoe 1816 days ago

Is there a technical reason why little endian is better besides the fact that it's more popular due to x86/ARM?

To me, big endian makes a lot more sense - integers are stored in the order which you read them (mentally).

I do remember reading about how little endian enabled some type of optimization inside the CPU, but I forget the specifics.

7 comments

SAI_Peregrinus 1816 days ago

For values that fit in a machine word, there are adder and multiplier designs that make the difference irrelevant. For larger values, or with some other adder/multiplier designs with different trade-offs, LE is dramatically faster.

Specifically, the problem is with "carries". When you're adding (or subtracting, or multiplying, or dividing, I'll just discuss adding) two binary values you might have to carry a 1 to the next place.

If you've got a BE value and an adder stage smaller than that value (say, a 32-bit number and 1-bit adder stages) you have to carry a 1 many times to output the result. If you're receiving the value in BE order 1 bit at a time you can't start the computation until you have the LSBs of both values, since if they're both 1 they'll affect the second bit by their carry. So you're stuck waiting for the entire value to start the computation. Further, during the computation you have to wait for every 1-bit adder in sequence.

There are "fast" adder designs that don't have to wait for every bit, but can instead work on groups of multiple bits with a carry-out at the end of the group. So if you've got an 8-bit group size, you'd have at most 3 carry delays during the computation of the output. For BE, you'd have to wait for all 4 bytes to be received, then wait for all the carry delays. For LE, you can start the computation as soon as the first byte is received, saving some time.

The larger the adder group size the more die area is needed, the stronger the drive strength of the transistors needs to be (bigger fan-out), and the slower the maximum clock of the overall system. On the other hand the bigger the group size the fewer carry delays, so addition can take fewer cycles. Most CPUs and MCUs implement single-cycle addition of their word size. Some CPUs even implement single-cycle multiplication at their word size.

link

monocasa 1816 days ago

On pretty much anything Linux is running on, full words are loaded into registers before a multiply begins. Even for say, something like x86 where a multiply can have a memory argument and say that it straddles a cache line boundary so you could get a portion of the word, the system still splits it into load to (temporary) register, execute mul, and store to memory micro-ops.

link

SAI_Peregrinus 1815 days ago

Correct. There are a few cases where the operands don't fit into a single machine word, the most notable being many cryptographic operations. Particularly RSA and ECC, which involve multiple-precision arithmetic.

There are also non-Linux cases, mostly microcontrollers. EG the Arm Cortex M0 doesn't have a hardware multiplier, the M0+ does.

And then there's that one guy who got Linux running on an 8-bit AVR by emulating a 32-bit ARM and running it on that[1]. I'd consider this a silly edge case. Too fun not to mention though.

[1] https://dmitry.gr/?r=05.Projects&proj=07.%20Linux%20on%208bi...

link

ectopod 1816 days ago

When you add numbers together you start at the little end. Imagine a bignum implementation with multi-word numbers. Now imagine you have a bunch of them in a file you want to add up. If the numbers are in little endian order you can do a streaming implementation that reads and adds at the same time. If they are in big endian order you need to read a whole number before it can be added to the accumulator.

Obviously this example is very contrived. This sort of thing was much more of a concern on 8-bit computers. But little endian still seems more natural to me.

link

Unklejoe 1816 days ago

That makes sense. It would also help from a cache prefetching perspective.

link

wizee 1816 days ago

With little endian, if you take a pointer to an integer of a large type (eg. uint64_t) where the value fits in a smaller type (eg. uint32_t), you will get same correct value accessing it as a uint64_t or uint32_t on a little endian system. This can make integer type conversion/casting slightly more efficient, and simplify code a bit.

link

quietbritishjim 1816 days ago

One small argument for little endian is that if you have a pointer void* then in little endian format you can interpret it as a int8_t*, int16_t*, etc. (or char*, short*, etc. in old money) and get the same numerical value if the number is small enough that it fits into all the types you try. I don't think that has much practical use but it does have a nice feel about it.

link

daenz 1816 days ago

Sounds like a footgun to me

link

bonzini 1815 days ago

It means for example that a bitmap is the same no matter if its code accesses it in groups of 8/16/32/64 bits.

link

daenz 1815 days ago

I get it, but it also means values will appear to be correct, instead of obviously wrong, if the data is cast without concern for the value range. Then one day someone enters a value and exceeds that range and BOOM

link

phkahler 1816 days ago

>> Is there a technical reason why little endian is better besides the fact that it's more popular due to x86/ARM? To me, big endian makes a lot more sense

Some of the other replies have minor technical reasons, but I've always preferred big endian for the readability. Having said that, I'm happy to part with the idea of big endian if it means an end to having 2 options to worry about. One thing that bothers me a lot about RISC-V is that the standard claims to allow either big or little endian implementations. Little has won and nothing new should support big endian IMHO. The benefits of either are largely irrelevant, but the existence of both is a problem. Or maybe it's that the existence of code that cares is the real problem ;-)

link

cesarb 1816 days ago

Besides what others have said, little endian is more natural: the byte at offset b has value 256**b, instead of 256**(n - b - 1).

link

bigbillheck 1816 days ago

I agree that big endian makes more sense, but I think that particular ship has long since sailed.

link