| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 3PS 846 days ago

Seems like an interesting language with a lot of new ideas, especially the declarative concurrency approach. I am very disappointed by the decision to go with UTF-16 for strings though [0] and strongly urge the author to reconsider. UTF-16 is the worst of all worlds: it is inefficient, it is endianness-dependent, it is still variable-width like UTF-8 for code points outside the basic multilingual plane (like emojis!), and it adds an O(n) penalty to processing most text from the internet. It also means that you need a whole new fundamentally different data type outside of char for encoding byte sequences.

Please, for modern software UTF-8 everywhere is the way to go!

https://utf8everywhere.org/

[0] https://docs.clarolang.com/common_programming_concepts/varia...

2 comments

egnehots 846 days ago

it's based on the JVM which uses UTF-16 internally for strings

link

3PS 845 days ago

Understandable, though JVM strings can also use the UTF-8 charset under the hood. In fact, if you initialize a Kotlin string from a byte array, it'll default to assuming the UTF-8 charset [0]. (Kotlin chars are still 16-bit code units though, and it is true that you can't use the native JVM char type if you do this. Personally I think that's still acceptable.)

[0] https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/-st...

link

Pet_Ant 846 days ago

> Please, for modern software UTF-8 everywhere is the way to go!

I think the real answer is to have a String interface and then allow for different implementations. There are times when it should be ASCII 8-bit. Or UCS 4. UCS 2 is reasonable. It's about trade-offs and how much constant width is worth. Just like you can use across different list and set implementations for the algorithmic profiles, you should be able to chose string types. Something UTF8#"Hello World" or EBDIC#"IBM".

Inserting, appending etc, a character that cannot fit into a given string should throw an exception.

link