Hacker News new | ask | show | jobs
by snnn 789 days ago
But the reality is: most glibc functions like `dirname` could not handle non UTF-8 encodings, because some encodings (like GBK) have overlaps with ASCII, which means when you search an ASCII char(like '\') in a char array, you may accidentally hit a half of a non-English character. Therefore, people in Asia usually do not use the non UTF-8 locales.
1 comments

Why would you search for an ASCII char like '\', in a char array containing non-ASCII-based text, on a system with a non-ASCII-based locale?
Because that's how "dirname(3)" is implemented in glibc, except it searches '/' instead of '\'. Here all character encodings share the same code.
But the byte '/' can never be part of any filename/dirname under a UNIX filesystem. Which kinda sucks generally for anyone wanting to use a charset like that, but doesn't it also mean that should never be a problem for `dirname()`?

I'm struggling to imagine how this failure would manifest. Can you give an example of how dirname() would fail? What combination of existing file/directory name, and usage of that function, would not work as expected?

Edit: I'm also a bit confused how this counts as being a problem for "modern Linux systems" - wouldn't it have always been a problem for all Unix-based OSs?