|
|
|
|
|
by stonekyx
1173 days ago
|
|
I think I observed this problem recently too, but with Japanese file names. Basically I had a Git repository and didn’t know about `git config core.precomposeUnicode`. And when the repository is synchronized to a Linux system, on the Linux side it can sometimes have 2 files with the same-looking file name but different normalizations. (Because I think ext4 doesn’t normalize Unicode?) That took me about an afternoon to fix. |
|
Yeah, native Linux filesystems don't really know anything about unicode at all at their heart. They really work more on raw bytes, simply reserving 0x00 (NUL) and 0x2F ('/'). Anything else goes in a filename as far as the kernel is concerned (including evil stuff like incomplete multibyte sequences, or other invalid UTF-8 byte sequences). User space is welcome to and encouraged to treat filenames as UTF-8 on modern systems, but that's not really enforced anywhere strongly.