|
|
|
|
|
by nereye
1164 days ago
|
|
> Korean has couple of dozen of symbols in its alphabet. While that is true (14 consonants, 10 vowels [0]), there are encodings for Korean that encode at the syllable level (where each syllable contains one or two consonants and one vowel) and the combinations for syllables are over 10000 (e.g. 11172 code points listed in Unicode, see [1]). [0]
in practice, more, both to cater for both modern and obsolete forms as well to distinguish the forms based on their position, i.e. with separate encodings for leading vs trailing consonants etc.). [1]
https://en.wikipedia.org/wiki/Hangul_Syllables |
|
(But I guess I also won't be surprised if the OpenAI guys can't write algorithms worth spit if it's not a large matrix multiplication.)