|
|
|
|
|
by pishpash
3082 days ago
|
|
Then it sounds like 'find' f'ed up, if, when these things are passed around, they are not escaped properly (not saying this is the case). Just like today with various charsets, whenever there is a charset boundary, say between bytes and C library strings, which is what this is, there has to be a charset conversion. |
|
The UNIX filesystem, qua filesystem, doesn't have a character set, just NUL-terminated strings. On the plus side, it's simple to handle, and means that retrofitting UTF-8 or another encoding is pretty easy. On the downside, two bytestrings that Unicode-canonicalize to the same value may name different files, which is surprising for humans.
It's notable that many of early UNIX' competitors were much more full-fledged systems, featuring full-fledged record-oriented files and typed data instead of UNIX' bytestrings-everywhere approach.