Hacker News new | ask | show | jobs
by jesuscyborg 2045 days ago
Yes. ASCII uses \b as the combining character mark which is a convention that's always been widely supported by typesetting programs such as less and nroff. For example, A\b_ is A̲, and you can do the same thing with apostrophe and tilde for accent marks. There's also UNICODE emojis where two codepoints in sequence get joined together as a single glyph. Never underestimate the creative ways text can be used, or that standards just codify a long history of practices.
1 comments

Er, I was asking about unicode joining, not this roff \b thing. Sorry for the confusion. I'm aware that multiple-codepoint unicode glyphs exist; I'm asking if any of those involve a codepoint in the ASCII (1-127) range which cannot be normalized to a single codepoint (e.g., e + ' normalizes to a single codepoint é).
Of course. Take for example mͫ (m+m) there's no way to represent that as a single codepoint. Combining marks can also be overlaid multiple times, e.g. m͚ͫ (m+m+∞) so the number of glyphs you can create is limitless. There's only a tiny number of the combinations that are possible which have a tinier normalized form. The new UNICODE combining marks work by almost exactly the same principles as the \b ASCII combining mark. That's why I mentioned it earlier.
Thanks!