Hacker News new | ask | show | jobs
The Curious Case of the Greek Final Sigma (knight666.com)
26 points by knight666 3938 days ago
2 comments

apparently that work is outdated (or incomplete) since 2010..

https://en.wikipedia.org/wiki/Capital_%E1%BA%9E

"Capital sharp s (ẞ) [...] never occurs word-initially in German text, and traditional German printing (which used blackletter) never used all-caps. When using all-caps, current spelling rules require the replacement of ß with SS.[1] However, in 2010 the use of the capital sharp s became mandatory in official documentation when writing geographical names in all-caps."

If Unicode defines the rule for initial/medial vs. final ſigma, I wonder why it doeſn't do the ſame for long vs. ſhort s.

More seriously, for encoding purposes shouldn't it be up to the application using the encoding to choose the right character, not up to the encoding system to specify the algorithm? But maybe I'm missing something.

The Unicode case-mapping algorithm is customizable by locale (e.g. uppercase i is İ in Turkish). A application which needed long s (for German Fraktur, for instance) could use a custom locale.
Specifying the rules for German Fraktur as executable code will be fun, as it uses the short s at the end of syllables, not words. So it is "Häschen" and "Häſcher"...
Unicode has things for this: https://en.wikipedia.org/wiki/Zero-width_non-joiner

And customization is done declaratively, not imperatively.