Hacker News new | ask | show | jobs
by paulddraper 3386 days ago
> You are not a high-level language if your standard library struggles with Unicode

So C++, Lisp, Java, Python, Ruby, PHP, and JS are not high-level languages.

HN teaches me something new every day.

3 comments

What would you say Java is missing? Sure, it does have the "oops, we implemented Unicode when they said we only needed 16 bits problem" but unlike, say, JS, it actually handles astral plane characters well (e.g., the regex implementation actually says that . matches an astral plane code point rather than half of one).

It does have all the major Unicode annexes--normalization (java.text.Normalizer), grapheme clusters (java.text.BreakIterator), BIDI (java.text.Bidi), line breaking (java.text.BreakIterator), not to mention the Unicode script and character class tables (java.lang.Character). And, since Java 8, it does have a proper code point iterator over character sequences.

I stand corrected. Java 8 has everything you could expect.
Python 3 does pretty good.
It does better than most, though Python 3 lacks grapheme support in the standard library, requiring developers to use a library like uniseg. I.e. it "lacks an effective way to deal with text that doesn't involve dragging in third-party libraries", and is thus evidently not a "high-level language".
How does Ruby struggle with unicode?
This is from a couple of weeks ago, there's a few things broken still, but what languages do have full support out of the box?

http://blog.honeybadger.io/ruby-s-unicode-support/

Swift
Go
Well, for one, I can't even write a portable unicode string literal.

> "\xAA".split ''

That works on a platform where my platform is UTF-32, but not one where it is UTF-8.