|
|
|
|
|
by gfody
99 days ago
|
|
> I wonder what the best way is to handle this disparity for international software. It seems like either you punish the Latin alphabets, or the others. there are over a million codepoints in unicode, thousands for latin and other language agnostic symbols emojis etc. utf-8 is designed to be backwards compatible with ascii, not to efficiently encode all of unicode. utf-16 is the reasonably efficient compromise for native unicode applications hence it being the internal format of strings in C# and sql server and such. the folks bleating about utf-8 being the best choice make the same mistake as the "utf-8 everywhere manifesto" guys: stats skewed by a web/american-centric bias - sure utf-8 is more efficient when your text is 99% markup and generally devoid of non-latin scripts, that's not my database and probably not most peoples |
|
1. https://en.wikipedia.org/wiki/UTF-16#Efficiency
2. https://en.wikipedia.org/wiki/UTF-8#Comparison_to_UTF-16
3. https://kitugenz.com/