|
> On UNIX, paths are UTF-8 by convention, but not forced to be valid. On UNIX, paths are a sequence of bytes, with two bytes being sacred to the kernel (0x2F, used to separate path elements, and 0x00, used to terminate paths) and no other bytes being interpreted in any way. Any character encoding which respects the sacred bytes by not using them to encode any other characters is therefore usable to make UNIX paths; in fact, a UNIX path can contain multiple encodings, as long as they're all suitably respectful. That requirement for respect means that UTF-16 and UCS-2 and UCS-4 are not suitable. UTF-7 is, however, as is UTF-8, and all of the ISO/IEC 8859 encodings are as well, not to mention a whole raft of non-standard "extended ASCII" character sets. In theory, UTF-16 in some suitably respectful encoding would work, too, but gouge my eyes out with a goddamned spoon. |
I.e., their comment is an RFC "OUGHT TO", not an RFC "MUST".