|
|
|
|
|
by enriquto
1538 days ago
|
|
Indeed. The unix world would be a much happier place if the creat system call normalized the strings it receives to replace literal spaces with non-breaking spaces, and similar stuff. Regular users wouldn't notice, and it would simplify tons of shell scripts. |
|
Nowadays, there’s an understanding to assume those bytes are strings encoded in some ISO-8859 variant or UTF-8, but technically, the creat system call doesn’t receive strings; it receives byte arrays.
Historically, that was the (somewhat) right decision because it meant file systems didn’t need to know much about character encodings (they only needed to know the byte value of ‘/‘ and that zero is the name terminator), giving you a nice separation of concerns.
With Unicode, if you want to normalize names on write, or even only reject incorrectly normalized names, or have case-insensitive file names, your file system code needs to know a lot of Unicode. That can be problematic on small embedded systems.