| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nickez 467 days ago
	Found an error immediately "Any lowercase character" doesn't match all Swedish lowercase characters.

2 comments

iugtmkbdfil834 467 days ago

Ok. This sounds like an interesting detour. Can you elaborate on that one? I doubt I will ever use that knowledge, but it sounds like it is worth knowing anyway.

link

Tryk 467 days ago

https://en.wikipedia.org/wiki/Swedish_alphabet

link

lalaithion 467 days ago

The author says “any lowercase character” but they mean “any character between the character ‘a’ and the character ‘z’”, which happens to correspond to the lower case letters in English but doesn’t include ü, õ, ø, etc.

link

comrade1234 467 days ago

lol really? Why not? Is that true for all encodings? Is it a bug or a feature? What about a simple character set like gsm-7 Swedish?

link

lalaithion 467 days ago

link

Someone 467 days ago

> but they mean “any character between the character ‘a’ and the character ‘z’”, which happens to correspond to the lower case letters in English

‘Only’ in the most commonly used character encodings. In EBCDIC (https://en.wikipedia.org/wiki/EBCDIC), the [a-z] range includes more than 26 characters.

That’s one of the reasons POSIX has character classes (https://en.wikipedia.org/wiki/Regular_expression#Character_c...). [:lower:] always gets you the lowercase characters in the encoding that the program uses.

link

comrade1234 467 days ago

I would expect [a-z] to mean any lowercase in any language, not lowercase but only a to z. So I’d get bitten by that one.

link

deciduously 467 days ago

The letters with diacritics sort lexicographically after 'z', so it does stand to reason they wouldn't appear in that range.

link

criddell 467 days ago

The Swedish alphabet includes characters outside of the a-z range.

link