Hacker News new | ask | show | jobs
by zlynx 3212 days ago
Easily moving from bytes to strings and back is the only way it makes sense for Go. It runs on POSIX for the most part, and every. single. POSIX. API. is done in bytes. Not Unicode. Bytes.

Languages like Python 3 that try to be so Unicode-pure that they crash or ignore legal Linux filenames are insane.

1 comments

I would dare say that the fact that Linux filenames don't have to be valid strings (i.e. they can be arbitrary byte sequences that cannot be meaningfully interpreted using the current locale encoding) is the insane part.

But does POSIX require support for arbitrary byte sequences in filenames, or does it merely use bytes (in locale encoding) as part of its ABI? I suspect the latter, since OS X is Unix-certified, and IIRC it does use UTF-16 for filenames on HFS - so presumably their POSIX API implementation maps to that somehow. If that's correct, then that's also the sane way forward - for the sake of POSIX compatibility, use byte arrays to pass strings around, but for the sake of sanity, require them to be valid UTF-8.