Hacker News new | ask | show | jobs
by dcrazy 243 days ago
Nowadays people expect their terminals to handle UTF-8, or at least the Latin-like subset of Unicode, without dealing with arcana such as codepages. For even the simplest fonts, rendering something like í likely requires drawing multiple glyphs: one for the dotless lowercase I stem, and one for the acute accent. It so happens that dotless lowercase I maps to a codepoint, but it is not generally true that a single extended grapheme cluster can be broken down into constituent codepoints. So even “simple” console output is nowadays complected by the details of Unicode-aware text rendering.