Hacker News new | ask | show | jobs
by arp242 984 days ago
The thing with normalization is that it's not free, and especially for embedded use cases people seem quite opposed to this. IIRC it requires about ~100K of binary size, ~20K of memory, and some non-zero number of CPU cycles. This is negligible for your desktop computer, but for embedded use cases this matters (or so I've been told).

This comes up in specifications that have a broad range of use cases; when I was involved in this my idea was to just spec things so that there's only one allowed form; you'll still need a small-ish table for this, but that's fine. But that's currently hard because for a few newer Latin-adjacent alphabets some letters cannot be represented without a combining character.

So then you have either the "accept that two things which seem visually similar are not identical" (meh) or "exclude embedded use cases" (meh).

I never really found a good way to unify these use cases. I've seen this come up a few times in various contexts over the years.

> Posted to HN several times has been the well documented proposal process from start to finish (it succeeded) of getting common and somewhat less common power symbols encoded in Unicode.

Would this work for an entirely new symbol I invent today? It's not really the Unicode people that are "difficult" here as such, they just ask for demonstrated usage, which is entirely reasonable, and that's hard to get (or: harder than it was before computers) especially for casual usage. I'm sure that if some country adopts/invents a new script today, as seems to be happening in West-Africa at in recent years, the Unicode people are more than amendable to work with that, but "I just like ‽" is a rather different type of thing.

1 comments

> Would this work for an entirely new symbol I invent today? It's not really the Unicode people that are "difficult" here as such, they just ask for demonstrated usage, which is entirely reasonable, and that's hard to get (or: harder than it was before computers) especially for casual usage.

Sure, they want demonstrated usage as inline in the flow of text as textual elements as opposed to purely iconography or design elements (because such things are outside of Unicode's remit, modulo some old Wingdings encoded for compatibility reasons and the fine line between emoji are expressive text and also emoji are useful for iconography in many cases). But at this point (again in contrast to the UCS-2/no-Astral-plane days) the committees don't seem to care how it was mocked up (do it on a chalkboard, do it in paint, do it in LaTeX drawing commands, whatever gets the point across) or how "casual" or infrequent the usage is, so long as you can state the case for "this is a text element" (not an icon!) used in living creative language expression. There's more "provenance" requirements for dead languages and they'll want some number of academic citations, but for living languages they've come to be flexible (no hard requirements) on the number of examples they need from the wild and where those are sourced from. Showing it in old classic documents/manuals/books, for instance, helps the case greatly, but the committees today no longer seem as limited to just what can be used to demonstrate usage. "I just like it" is obviously not a rock solid proposal/defense to bring to a committee (any committee, really), but that doesn't mean that is impossible for the committee to be swayed by someone making a strong enough "I just like it" case if they demonstrate well enough why they like it and how they use it and how they think other people will use it (and how those uses aren't just iconography/decorative elements but useful in the inline context of textual language).