| I agree with most of your bashing. PHP is a fucking mess, and non OO strings is a hangover vomit from C. That being said, when there is a problem, there actually is a solution. PCRE actually works. Javascript, for instance, has no collation support at all as far as I can tell. Personally, most of the problems with UTF-8 are mixed content issues. The best fix is not the Python route, but rather just deprecating a bunch of stuff such as utf8_encode/decode. Throw a warning when any database connection is not utf8. Throw a warning when the OS is not setup to return UTF-8. It is more important that people run php in a end-to-end utf-8 environment, than changing the internals. Once people have a good environment, they will stop talking about strlen/strpos which are really not much of a problem. Maybe they should be renamed bytelen/bytepos, but php has too many of that type of problem to count. 99% of the UTF-8 problems don't exist if everything is UTF-8. Counting unicode code points vs bytes is not the real problem. The real problem is bullshit like 'SET NAMES utf8' / setlocale('LC_ALL','en_US.utf-8') BTW, what language do you think gets this stuff right? Go looks promising, but it is brand new. I have problems with pretty much every language I know well: javascript/python/Objective-C/PHP |