Hacker News new | ask | show | jobs
by kahirsch 1086 days ago
This is a holdover from COBOL, which uses fixed-length, space-padded strings.

I read an interesting bit in a DBMS paper once about space-compression and the oddities of space-padding collation order.

As this article says, 'A\x03' is supposed to come before 'A' in collation order, because the 'A' is (conceptually, at least) padded with spaces before being compared.

So in the paper I was reading, they had to have different representations to compress runs of spaces, one for when the spaces are followed by a character > space (or end-of-string), and another for when the spaces are followed by a character < space. In the final representation, you can just use memcmp to compare the strings.

I think that same paper talked about the pains that IBM had when they first ported DB2 (or maybe another earlier DBMS) to a little-endian architecture. There were many places in their collation and indexing code where they compared 4-character segments using integer comparison. That works great on IBM's big-endian architecture, but not on, e.g. x86.