|
|
|
|
|
by tn13
1674 days ago
|
|
English speaking world has developed intuition about strings due to ASCII which simply fails when it comes to Unicode and that basically explains a lot of these pitfalls. String length when defined #2 is also fairly complex when it comes to some languages such as Hindi. There are some symbols in Hindi which are not characters and can never exist as their own character but when placed next to a character they create a new character. So when you type them out on a keyboard you have to bit two keys but only one character will appear on screen. Unicode too represents this as two separate characters but for human eye it is one. त + या = त्या Following code will print 4 console.log("त्या".length); |
|
a.k.a. 'ligatures', as in f+f+i -> U+fb03 'ffi'