| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by poulpy123 25 days ago

looking at the changes it makes me wonder:

- is there an usable font the cover all unicode ?

- if not is there really a point to include everything possible in unicode ?

- how many space is remaining for new alphabet and smileys ?

- how do they handle changes in scripts, for example if new proto-cuneiform or seal script symbols are discovered ?

4 comments

wongarsu 25 days ago

> if not is there really a point to include everything possible in unicode ?

Needing to load three fonts to show a single document that mixes vastly different character sets is still infinitely better than not being able to have those different characters in the same .txt or .md file at all

> how many space is remaining for new alphabet and smileys ?

Unicode can encode about 1100k code points, and about 800k of those are currently unassigned and available for future scripts or characters

link

xigoi 25 days ago

Also, the 1.1M limit is because of UTF-16. If UTF-16 was deprecated in favor of UTF-8, the limit could be much higher.

link

hulitu 24 days ago

We need UTF-32. For the future.

link

xigoi 24 days ago

UTF-32 already exists, but nobody uses it because it’s much less efficient for most textual data than UTF-8.

link

pvdebbe 24 days ago

UTFv6

link

tecleandor 25 days ago

> how do they handle changes in scripts, for example if new proto-cuneiform or seal script symbols are discovered

They get added in the next Unicode revision.

In Unicode you have "blocks" [0] that are often bigger than the number of characters in a script, language or function. There are usually also space for new blocks between unrelated blocks.

For example, in the case of cuneiform, it was introduced in Unicode 5.0, and there have been revisions in 7.0 and 8.0 [1]

  0: https://en.wikipedia.org/wiki/Unicode_block
  1: https://en.wikipedia.org/wiki/Cuneiform_(Unicode_block)#History

link

lifthrasiir 25 days ago

As an example of having not-exactly-a-character as Unicode "characters", it is rather rare that musical symbols are embedded in running texts (which is a primary litmus test for encoding), but musical symbols are typically rendered with existing font technology so there are needs for standardized "character" codes, as SMuFL [1] does. In fact Unicode 18 will get tons of musical symbols that have been in SMuFL for a long time but not yet in Unicode [2].

[1] https://www.smufl.org/

[2] https://www.unicode.org/L2/L2025/25017-miscellaneous-musical...

link

pveierland 25 days ago

The Noto fonts have great coverage: https://notofonts.github.io/overview/

link

infinita740 25 days ago

Pretty cool vizualisation.

There is also GNU unifont [1] "The original intent of Unifont was to offer a simple font format with wide Unicode coverage to render something meaningful for each Unicode code point"

[1] https://unifoundry.com/unifont/index.html

link