Hacker News new | ask | show | jobs
by reom_tobit 1863 days ago
Tangent, but I’ve never really understood what the issue is with unicode support in various languages.

Does anyone have any idea why it’s such a contentious issue?

2 comments

Unicode reflects a reality about human writing systems. They are very complicated. This is more or less guaranteed to result in Unicode being contentious.

After all, it's obvious features my native language has are important and need to be first class APIs in the standard library, while any features that language doesn't use has aren't important and the standard library shouldn't be clogged up with anything so useless. Also things that are easy to do for my preferred writing system must be supported, if the easy way to implement them doesn't work for some other widely used languages, just ignore that, those people don't matter anyway.

Basically because there's 2 major ways to do it: the Windows way and the Unix way (UTF-8). Unicode has the concept of encodings and it doesn't tell you which one to use.

The Unix way is winning on the web, and I think Microsoft has made some moves toward UTF-8, but I don't understand what they are exactly:

https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#W...

JavaScript and Java inherited the Windows way. Go and Rust use the Unix way (and apparently OCaml too). Python supports both which some say is a needless source of complexity, but it is flexible if you know how to use it.

Awesome, thanks for the info. Sent me down a rabbit hole for a little bit.
In case you didn't already get it, this is a good and readable summary:

https://www.joelonsoftware.com/2003/10/08/the-absolute-minim...

Amazing link, would highly recommend to anyone reading this thread. Thanks again