Hacker News new | ask | show | jobs
by toolate 5698 days ago
While I agree that 'z'++ being 'aa' is a bit silly, I can't ever see anyone making the case that 'aa' should be greater than 'z'.
4 comments

It's not that 'aa' would make any sense being greater than 'z', it's just the non-intuitive result that comes out of the combination of those features.

To my mind it's a good example of how languages that try to throw in every feature and the kitchen sink end up acquiring a lot of bizarre warts from unforeseen interactions.

As long as we're enforcing ordering on casually unordered sets, when not define it as length of the string? Now 'aa' is certainly greater than 'z'.
There's a pretty good case for making the comparison closer to alphabetical ordering. 'apple' before 'pear' even though 'apple' is the longer string.

'z' before 'zz' also follows alphabetical ordering.

Note that there is no such thing as "the alphabetical ordering". Different languages define different collations, and some even multiple ones (e.g., German collation vs German phone book collation, or the various collation systems for Chinese characters). I'm pretty sure PHP's comparison operator will define non-ASCII characters as being outside of the alphabet, and probably just fail on multi-byte strings (UTF-8).

So if you are doing comparisons on strings, you probably either have an i18n bug or a really, really specific use case.

That's what I was thinking, and in this case 'zz' would be > 'a'. I'd say that's a reasonable choice.
Clearly you have not ventured too far outside the "safe" en-US cultures ;)

In the Scandinavian languages (Danish, Swedish and Norwegian) "aa" is (for whatever historical reasons) considered a synonym for "å", which is that last letter of the alphabet. In the same way "ae" maps "æ" (third last) and "oe" maps to "ø" (second last).

Hence, using culture-aware sorting, the following array may actually not be sorted: [ "a", "aa", "ae", "b", "oe", "of" ]. You will find similar sort behaviour in a lot of databases when you set the database/table/column collation to other cultures.

While I agree the PHP implementation is silly, your blanket dismissal of the possibility that 'aa' can't be greater than 'z' is equally ignorant.

A couple of years ago I had to devise PHP sort algorithms for Danish, German, and Russian. (Unsurprisingly, native speakers are essential for testing this kind of thing ;) )

It was surprising at the time that these sort algorithms were not generally available in code. We also found that we couldn't rely on the db (MySql) to do the right thing, which was not helped, of course, by having these character sets, and others, in the db. The db was fully utf8.

Once you grok PHP's underlying method, it's simply a case of using the above mentioned technique -- replacing the "foreign" symbols with character pairs -- to establish the correct order.

In the Danish case above, for example, I used zx, zy, and zz.

'z'++ is not 'aa'. 'z'++ is not even valid PHP. $z='z';$p=$z++; does nothing either.

it's only in the for-loop context.

Not really, after $p = $z++ $p is 'z', $z is 'aa', as you'd kind-of expect. But the difference between literals and expressions/variable references of course just adds icing to the cake of bizareness.
try...

<?php $z='z'; echo ++$z;

which produces 'aa'