| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by toolate 5746 days ago
	While I agree that 'z'++ being 'aa' is a bit silly, I can't ever see anyone making the case that 'aa' should be greater than 'z'.

4 comments

msbarnett 5746 days ago

It's not that 'aa' would make any sense being greater than 'z', it's just the non-intuitive result that comes out of the combination of those features.

To my mind it's a good example of how languages that try to throw in every feature and the kitchen sink end up acquiring a lot of bizarre warts from unforeseen interactions.

link

tel 5746 days ago

As long as we're enforcing ordering on casually unordered sets, when not define it as length of the string? Now 'aa' is certainly greater than 'z'.

link

brianpan 5746 days ago

There's a pretty good case for making the comparison closer to alphabetical ordering. 'apple' before 'pear' even though 'apple' is the longer string.

'z' before 'zz' also follows alphabetical ordering.

link

Nitramp 5746 days ago

Note that there is no such thing as "the alphabetical ordering". Different languages define different collations, and some even multiple ones (e.g., German collation vs German phone book collation, or the various collation systems for Chinese characters). I'm pretty sure PHP's comparison operator will define non-ASCII characters as being outside of the alphabet, and probably just fail on multi-byte strings (UTF-8).

So if you are doing comparisons on strings, you probably either have an i18n bug or a really, really specific use case.

link

roel_v 5746 days ago

That's what I was thinking, and in this case 'zz' would be > 'a'. I'd say that's a reasonable choice.

link

trezor 5745 days ago

Clearly you have not ventured too far outside the "safe" en-US cultures ;)

In the Scandinavian languages (Danish, Swedish and Norwegian) "aa" is (for whatever historical reasons) considered a synonym for "å", which is that last letter of the alphabet. In the same way "ae" maps "æ" (third last) and "oe" maps to "ø" (second last).

Hence, using culture-aware sorting, the following array may actually not be sorted: [ "a", "aa", "ae", "b", "oe", "of" ]. You will find similar sort behaviour in a lot of databases when you set the database/table/column collation to other cultures.

While I agree the PHP implementation is silly, your blanket dismissal of the possibility that 'aa' can't be greater than 'z' is equally ignorant.

link

auxbuss 5745 days ago

A couple of years ago I had to devise PHP sort algorithms for Danish, German, and Russian. (Unsurprisingly, native speakers are essential for testing this kind of thing ;) )

It was surprising at the time that these sort algorithms were not generally available in code. We also found that we couldn't rely on the db (MySql) to do the right thing, which was not helped, of course, by having these character sets, and others, in the db. The db was fully utf8.

Once you grok PHP's underlying method, it's simply a case of using the above mentioned technique -- replacing the "foreign" symbols with character pairs -- to establish the correct order.

In the Danish case above, for example, I used zx, zy, and zz.

link

zackattack 5746 days ago

'z'++ is not 'aa'. 'z'++ is not even valid PHP. $z='z';$p=$z++; does nothing either.

it's only in the for-loop context.

link

Nitramp 5746 days ago

Not really, after $p = $z++ $p is 'z', $z is 'aa', as you'd kind-of expect. But the difference between literals and expressions/variable references of course just adds icing to the cake of bizareness.

link

rythie 5746 days ago

try...

<?php $z='z'; echo ++$z;

which produces 'aa'

link