| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by snnn 789 days ago
	But the reality is: most glibc functions like `dirname` could not handle non UTF-8 encodings, because some encodings (like GBK) have overlaps with ASCII, which means when you search an ASCII char(like '\') in a char array, you may accidentally hit a half of a non-English character. Therefore, people in Asia usually do not use the non UTF-8 locales.

1 comments

Karellen 789 days ago

Why would you search for an ASCII char like '\', in a char array containing non-ASCII-based text, on a system with a non-ASCII-based locale?

snnn 789 days ago

Because that's how "dirname(3)" is implemented in glibc, except it searches '/' instead of '\'. Here all character encodings share the same code.

Karellen 788 days ago

But the byte '/' can never be part of any filename/dirname under a UNIX filesystem. Which kinda sucks generally for anyone wanting to use a charset like that, but doesn't it also mean that should never be a problem for `dirname()`?

I'm struggling to imagine how this failure would manifest. Can you give an example of how dirname() would fail? What combination of existing file/directory name, and usage of that function, would not work as expected?

Edit: I'm also a bit confused how this counts as being a problem for "modern Linux systems" - wouldn't it have always been a problem for all Unix-based OSs?