Hacker News new | ask | show | jobs
by dotancohen 3212 days ago
> A more interesting question is whether UCS-4's advantages are worth it. It provides an array of characters, but as the years pass, the code I see does ever less char-array processing on strings. 20-30 years ago the world was full of char pointers, now, not so much. Something like this looks more typical, and doesn't benefit much from UCS-4, if at all: foo.split(" ").each{|word| bar(word) }.

You are looking at the issue from the perspective from a language user, not a language designer. 20 years ago we didn't have languages such as Python/Ruby which had internal multibyte support in their sting manipulation functions. 20 years ago string manipulation functions didn't even exist!

But this post is about the design of the language, not the application, and the language is still written in C/C++ and _internally_ stores strings as byte arrays that must be presented nicely to the programmer in that language's string manipulation functions.

1 comments

> You are looking at the issue from the perspective from a language user, not a language designer. 20 years ago we didn't have languages such as Python/Ruby which had internal multibyte support in their sting manipulation functions.

20 years ago was 1997. I'm reasonably certain NSString has been unicode-aware for much longer than that.

> 20 years ago string manipulation functions didn't even exist!

What kind of absolute utter nonsense is that?

> But this post is about the design of the language, not the application, and the language is still written in C/C++ and _internally_ stores strings as byte arrays that must be presented nicely to the programmer in that language's string manipulation functions.

So?

That should have been _30_ years ago string manipulation functions didn't exist.

NSString may have been Unicode-aware (I've never used Objective-C), and I believe that even the early Javas supported multibyte strings, but at that time most business and consumer desktop applications in the Windows world were still written in C/C++. Do you remember when the Euro symbol became common? I'm pretty sure that character alone was responsible for much of the push to support Unicode.