Hacker News new | ask | show | jobs
by tialaramex 2669 days ago
"Character Set" is usually the phrase.

A character set can be encoded in a variety of ways, for Unicode / ISO-10646 the encoding UTF-8 is the most popular for a variety of reasons that I'm sure will one day be an exciting historical artefact for HN readers to remark upon.

I don't like the word character, because it tends to cause idiots to build software that thinks Unicode codepoints are the indivisible unit out of which strings are made, and that's no more true than for bytes. I prefer the nice fuzzy word "squiggle" when I mean the thing you as a human are perhaps imagining when saying "character" and to use nice technical terms like "pictogram", "grapheme", "glyph", "code point", "code unit", "symbol", and so on when I mean those specific technical things. But in the phrase "character set" that's what we ended up with, so be it.