| HN Mirror

I would dare say that the fact that Linux filenames don't have to be valid strings (i.e. they can be arbitrary byte sequences that cannot be meaningfully interpreted using the current locale encoding) is the insane part.

But does POSIX require support for arbitrary byte sequences in filenames, or does it merely use bytes (in locale encoding) as part of its ABI? I suspect the latter, since OS X is Unix-certified, and IIRC it does use UTF-16 for filenames on HFS - so presumably their POSIX API implementation maps to that somehow. If that's correct, then that's also the sane way forward - for the sake of POSIX compatibility, use byte arrays to pass strings around, but for the sake of sanity, require them to be valid UTF-8.