|
|
|
|
|
by WorldMaker
660 days ago
|
|
Though it's also a stuck legacy throwback. Modern advice would be to not send ligatures directly to the renderer and instead let the renderer poll OpenType features (and Unicode/ICU algorithms) to build them itself. PDF's baking of some ligatures in its files seems something of a backwards compatibility legacy mistake to still support ancient "dumb" PostScript fonts and pre-Unicode font encodings (or least pre-Unicode Normalization Forms). It's also a bit of the fact that PDF has always been confused about if it is the final renderer in a stack or not. |
|
Even for English the exact tweaking of line breaking and hyphenation is a problem that requires manual intervention from time to time. In mathematics research papers it’s not uncommon to see symbols that haven’t yet made it into Unicode. Look at the state of text on the web and you’ll encounter all these problems; even Google Docs gave in and now renders to a canvas.
PDF’s Unicode handling is indeed a big mess but it does have the ability to associate any glyph with an arbitrary Unicode string, for text extraction purposes, so there’s nothing to stop the program that generates the PDF from mapping the fi ligature glyph to the to-character string “fi”.