Hacker News new | ask | show | jobs
by wtbob 3938 days ago
If Unicode defines the rule for initial/medial vs. final ſigma, I wonder why it doeſn't do the ſame for long vs. ſhort s.

More seriously, for encoding purposes shouldn't it be up to the application using the encoding to choose the right character, not up to the encoding system to specify the algorithm? But maybe I'm missing something.

1 comments

The Unicode case-mapping algorithm is customizable by locale (e.g. uppercase i is İ in Turkish). A application which needed long s (for German Fraktur, for instance) could use a custom locale.
Specifying the rules for German Fraktur as executable code will be fun, as it uses the short s at the end of syllables, not words. So it is "Häschen" and "Häſcher"...
Unicode has things for this: https://en.wikipedia.org/wiki/Zero-width_non-joiner

And customization is done declaratively, not imperatively.