Hacker News new | ask | show | jobs
by shuzchen 5095 days ago
If it wasn't clear by the comments on the bug report or by the quoted sections of this comment's parent, let me rephrase it. This issue is entirely caused by the fact that PHP is case insensitive for classes and function names (but not variables, go figure). That is, if you define a class MyClass, you can instantiate it using MyClass or myclass or MYCLASS. You can call the functions from the standard library in whatever case either (so, array_map or ARRAY_MAP is fine).

Based on the behavior of this bug, it appears that the way PHP handles this case insensitivity is that it just lowercases all class and function names before resolving them. And this bug in particular shows up for Turkish because 'i' is not the lowercase equivalent of 'I'.

Pretty much all other modern languages are case sensitive, so I'd be surprised to find this issue elsewhere.

2 comments

Well, VB.NET is case insensitive yet the problem doesn’t crop up there because it’s not braindead enough to use the same locale while compiling & executing. Yes, I get that PHP code isn’t compiled in a separate step but there still is no reason for it to use a user-defined locale. It should use the C locale, end of story. I don’t understand why this isn’t trivial to fix. Is there any place where PHP depends on a user-defined locale for parsing?

EDIT: “trivial to fix” as in, doesn’t cause regression, not necessarily that it’s a small change to the code base.

Class names can crop up during execution as well though. This is valid PHP:

  $classname = $row_I_got_from_mysql['classname'];
  $object = new $classname;
I'm sure this can still be solved though. It's not trivial, but it's not "takes over 9 years to fix" complex either.
PHP could just use the approach NTFS uses on Windows and convert to upper case instead:

http://blogs.msdn.com/b/michkap/archive/2004/12/02/273619.as...

Which would not help in this case, since in turkish the upper-case representation of `i` is not `I` but a different symbol. So the class you're looking for would not exist.
Oops - looks like you're actually right there. For some reason, I thought I'd read that i and ı were both mapped to I.
That still doesn't make sense. If 'i' is not the lowercase equivalent of 'I', then the lowercasing should just result in another letter, right? The only thing that could cause the bug is if it uses two different ways of lowercasing (perhaps one when registering the class, and another way when looking up the class).

The mapping between uppercase and lowercase can be completely arbitrary, and as long as it's used consistently you shouldn't get these kind of bugs.

it's really not that simple. Check the "Fold Case" section of the Letter Case article on Wikipedia; the explanation is much better:

http://en.wikipedia.org/wiki/Letter_case#Unicode_case_foldin...

It's not PHP's fault that accurately performing case transformations across locales is difficult; it's just actually very difficult. The solution isn't to "fix" the process of transforming letter case; the solution is to simply not transform the names of your identifiers. Unfortunately that is simple only in a very isolated setting; in the real world, doing such a thing is liable to break a lot of software.

This is a really good example of the problem at hand:

>The Greek letter Σ has two different lowercase forms: "ς" in word-final position and "σ" elsewhere.

The identifiers are lowercased multiple times; first at parse time, presumably using the locale of the OS, some setting in php.ini, or some fixed locale. (it doesn't, in practice, matter where this initial locale is set; it just matters that it's set at parse time.) It's then lowercased again at runtime; if the locale was changed at runtime, such that the casing rules in the two locales produce any differences, the identifier will not be found.

I'm not saying this to defend PHP; just to shed some light on the case-folding problem. Having case-insensitive identifiers is a design mistake.

The issue only occurs when the locale is changed between registering and looking up the class.
Doesn't look like that from the bug report; there the locale is set first, then the class is defined and then looked up.
PHP registers classes (and functions) at parse time, not at execution time.

i.e. this will print "bar":

    <?php
    echo foo();
    function foo() { return "bar"; }