Hacker News new | ask | show | jobs
by p_l 702 days ago
"Ą" is a separate letter in polish alfabet, not an accented variant of "A".

There are writing systems where combining accents are used to represent just variation on a letter. Use of combining characters for "Ą" (and "Ć" and "Ł" and many other so-called "polish letters") is, at best, a historical artefact of trying to write them in deficient encodings.

1 comments

It doesn't matter that it's a separate letter in an alphabet, you're denying the obvious - it IS an accented (or ogonek'ed) variant of A, and you can achieve this in Unicode in 2 ways: having one id for a precomposed variant and composing the variant from two ids.

There is no semantic difference, just an encoding one, the end result looks the same and means the same thing (well, to a point, it still depends on the context - like what language you mean - but within the same context it's the same thing and there are even Unicode rules to treat it the same like in search etc.)

And precomposed is just the same historical deficiency - you could've just as well designed a more compact encoding with no precomposed letters, only combinations

This is correct, and you can look into Unicode Normalized Form C (NFC) to find the conversion and equivalence rules.