| Hello. Font's author here. You and Jeff are correct in guessing this is (ab)using ligatures maximally :) To satisfy your curiosity, we can go deeper. ---- Conceptually it is simple:
1. assign a default (most likely) sound for each character,
2. loop through contexts, extracting words (char-combos) where the sound is different from the default ("alt-word")
3. create SVGs + font-paths (fallback for incompatible systems) for every char and every alt-word
4. assign a ligature to substitute each char-sequence that forms the alt-word (e.g., "when 乾 隆 appears adjacently, replace with `uniF1234` (the codepoint for the alt-word 乾隆") It is not perfect, but I didn't expect this to work so well, and was stunned when the testers report high accuracy. I have always believed that bespoke computation with word segmentation (with some 1M frequency attached library) and large data-bank (100k+ words) was necessary. ---- Practically it was horrific, tedious, mind-numbing, gawd-awful set of "why this doesn't work":
1. SVG automation that works for 10^3 breaks with 10^5
2. what worked for Latin breaks for unicode
3. what worked for unicode breaks for PUA
4. what worked for monochrome breaks for color
5. what worked for single glyphs breaks for ligatures
6. what?! The assignments in the database is wrong??
7. [...] As I was trying to coerce the system to do what it wasn't designed to do, many of these breaks are undocumented, pretty mysterious to solve, and some steps just got manually gritted through. (And each of the 15k+ glyphs got gritted through about five times.) It does look pretty elegant at the end ;) |
> Unfortunately, without being able to do proper word segmentation, this will remain a limitation.
Can the user manually add a zero width space to help?