Hacker News new | ask | show | jobs
by loeg 992 days ago
It's an oversimplification to say Linux uses UTF-8 for display. Linux just stores bags of bytes and leaves interpretation to userspace. You could store paths in ISO-8859-1 if you wanted. The only special bytes are '\0' and '/'.
2 comments

Not only could you, this actually happens in practice. Not necessarily ISO-8859-1, but specifically SHIFT-JIS, a Japanese encoding that you will run into if you run old Japanese software. To make things even worse, SHIFT-JIS is almost entirely incompatible with any form of UTF based encoding, and depending on the attempted normalisation you can quickly end up with paths that have been messed up multiple times in a row.

I forgot what Japanese emulator I tried to run when I found all of this out, ut sufficed to say I didn't enjoy the experience.

I buy digital Japanese Doujin music on sites like booth.pm, and their provided zip files extracts "beautifully" on Linux if you simply `unzip` them.
Lots of Japanese products are switching or have switched to UTF-8, so I have no doubt that modern ZIP files will extract without a problem.
Don't you mean `unzip -O shift-jis` them?
Except that Linux does support several filesystems that do claim to store the filenames in a specific encoding and therefore the kernel must do conversion. Mostly Windows FSes, but nowadays case-insensitive ext4 also applies.
These are exceptions, not the norm. The VFS layer does not care.