Hacker News new | ask | show | jobs
by kitsune_ 5096 days ago
Words fail me.
4 comments

I know it seems insane, but Turkish capitalization is not fun to work with as a programmer. When they latinized the alphabet 100 years ago or so, they were short on vowels and so it must have seemed pretty clever and convenient to make i and I separate letters with İ and ı respective case pairs. From a western programmers perspective though it's one of the worst unicode special cases owing to its combined unexpectedness and commonness.

Just as an example, text-transform: uppercase has been broken in Turkish for all major browsers until I believe Firefox finally fixed it late last year, after having a bug open for nearly a decade.

From my point of view there could be one very simple solution: just add new codepoints in unicode for turkish I and i. So the latin i would follow the common case conventions, and turkish i would use whatever crazy stuff they have there.

Of course that might be bit late to do now, there is probably too much text encoded in the current format.

It's probably a bit late, agreed, but it seems to me this problem is just as much the fault of the encoding itself as it is the fault of PHP : Turkish i and I may look like Western European i and I but they're entirely different characters.
If they only had used Ï and ï instead everything would have been much simpler.
Just my curiosity, how do you know that they were short on vowels?
It's funny, the first thing I thought is "someone was having trouble with the turkish I and tried a hackaround, and now it's unfixable."

I blame Atatürk. If I had a time machine, I'd skip killing Hitler and travel back to the language reform time. "Do you know how much trouble this is going to cause us? Reuse the X, make one a dotted e. I don't care, this is going to fuck everything up!"

Take the case of software insisting on state input hence unusable outside US. Do you blame the error to George Washington?
Şimdi İstanbul'da oturuyorum. :) (I think that's right, I'm still learning the language...) The comment was meant to be snarky -- obviously, the e-i-ö-ü \ a-ı-o-u rule would be broken, which is the reason for the undotted I. Further, nobody could have anticipated in 1927 the vast extent of automation that we are going through now.

For those that don't know Turkish, there is a faced of the language called vowel harmony. When suffixes are added to a word, which is common in Turkish for everything from pluralization, verb congugations to prepositions, the vowels in the suffix will be altered to match the last vowel. (Some Arab loanwords don't follow this, mind you, but it works 98% of the time.) So, the dotted and undotted vowels (except e for some reason) all follow this pattern.

(Incidentally, however, the problems could be solved by turning the single dotted i into a double dotted I, keeping the original symmetry. At that point, a lowercase dotted I would no longer break any system, since you could map them to be functionally equivalent for anything after the reform. While we're at it, I have a few ideas for English language reform...)

You are missing the point that "new" alphabet was way more before designed than "systems".

Right way should be involving in a process of developing standards, not changing some chars because some new tech come and has problems with the language ( going rampage as you mentioned earlier isn't also a valid solution ).

I don't quite understand your point here. I understand which came first, by quite some time. It happens that this is a stickier problem than merely "developing standards." What happens when an American tries to log in from a Turkish terminal? If everything is made case insensitive, i turns into İ rather than I. Similarly, what happens when a Türk logs in from another terminal? Do you have the locale attached to the user? (Public terminals can be an issue if someone can't change the keyboard layout. Do they log in as denIz? Does that work? It might if everything is brought to upper case, but not lowercase.

The Turkish I is one of the most interesting issues in internationalization.

"I'd skip killing Hitler" would still be a bad idea. you cannot compare the shoa to language reform.
http://www.abyssapexzine.com/wikihistory/ is the reference. I assumed a common geek humor touchpoint.

"...everybody kills Hitler on their first trip. I did. It always gets fixed within a few minutes, what’s the harm?"

In a way, that is amazing. Truly.
They fail for Turkish PHP developers also. You're in good company!