Hacker News new | ask | show | jobs
by pieter 5104 days ago
That still doesn't make sense. If 'i' is not the lowercase equivalent of 'I', then the lowercasing should just result in another letter, right? The only thing that could cause the bug is if it uses two different ways of lowercasing (perhaps one when registering the class, and another way when looking up the class).

The mapping between uppercase and lowercase can be completely arbitrary, and as long as it's used consistently you shouldn't get these kind of bugs.

2 comments

it's really not that simple. Check the "Fold Case" section of the Letter Case article on Wikipedia; the explanation is much better:

http://en.wikipedia.org/wiki/Letter_case#Unicode_case_foldin...

It's not PHP's fault that accurately performing case transformations across locales is difficult; it's just actually very difficult. The solution isn't to "fix" the process of transforming letter case; the solution is to simply not transform the names of your identifiers. Unfortunately that is simple only in a very isolated setting; in the real world, doing such a thing is liable to break a lot of software.

This is a really good example of the problem at hand:

>The Greek letter Σ has two different lowercase forms: "ς" in word-final position and "σ" elsewhere.

The identifiers are lowercased multiple times; first at parse time, presumably using the locale of the OS, some setting in php.ini, or some fixed locale. (it doesn't, in practice, matter where this initial locale is set; it just matters that it's set at parse time.) It's then lowercased again at runtime; if the locale was changed at runtime, such that the casing rules in the two locales produce any differences, the identifier will not be found.

I'm not saying this to defend PHP; just to shed some light on the case-folding problem. Having case-insensitive identifiers is a design mistake.

The issue only occurs when the locale is changed between registering and looking up the class.
Doesn't look like that from the bug report; there the locale is set first, then the class is defined and then looked up.
PHP registers classes (and functions) at parse time, not at execution time.

i.e. this will print "bar":

    <?php
    echo foo();
    function foo() { return "bar"; }