Hacker News new | ask | show | jobs
by taneq 301 days ago
If you're playing at this level, you need to define:

- letter

- word

- 5 :P

1 comments

Eh in macedonian they have some letters that in russian are just 2 separate letters
In German you have the same, only within one language. ß can be written as ss if it isn't available in a font, and only in 2017 they added a capital version. So depending the font and the unicode version the number of letters can differ.
"Traditionally, ⟨ß⟩ did not have a capital form, and was capitalized as ⟨SS⟩. Some type designers introduced capitalized variants. In 2017, the Council for German Orthography officially adopted a capital form ⟨ẞ⟩ as an acceptable variant, ending a long debate."

Thanks, that is interesting!

should "ß" == "ss" evaluate as true?
I don't see why it should. I also believe parent is wrong as there are unambiguous rules about when to use ß or ss.

Never thought of it but maybe there are rules that allow to visually present the code point for ß as ss? At least (from experience as a user) there seem to be a singular "ss" codepoint.

>also believe parent is wrong as there are unambiguous rules about when to use ß or ss.

I never said it was ambiguous, I said it depends on the unicode version and the font you are using. How is that wrong? (Seems like the capital of ß is still SS in the latest unicode but since ẞ is the preferred capital version now this should change in the future)

> How is that wrong? Not sure where, how or if it's defined as part of Unicode, but so far I assumed that for a Unicode grapheme there exists a notion of what the visual representation should look like. If Unicode still defines capital of ß as SS that's an error in Unicode due to slow adaption of the changes in the German language.
ẞ is not the preferred capital version, it is an acceptable variant (according to the Council for German Orthography).
well I don't speak german, I was asking
I see, wasn't clear to me on what level you were asking. The letter ß has never been generally equivalent to ss in the German language.

From a user experience perspective though it might be beneficial to pretend that "ß" == "ss" holds when parsing user input.

That's not really any different than the distinction (or lack thereof) between "ae" and "æ". For that matter, in Russian there is a letter "ы" which is historically a digraph consisting of two separately letters "ъ" and "i" that just happens to be treated as a single letter for so long that few people would even recognize it as a digraph. This kind of stuff is all language-specific, which is why for Worlde etc you always need to be aware of the context, and this context will then unambiguously decide what constitutes a single letter.
Niße. ;)