Hacker News new | ask | show | jobs
by Anthony-G 1538 days ago
In 2009, David A. Wheeler wrote a comprehensive article covering problems with Unix/Linux/POSIX filenames¹. Given that the OS naïvely treats filenames as a simple stream of bytes, he advocated that developers use UTF-8 for encoding filenames. He mentioned the issue of multiple normalisation systems being used to encode characters that have more than one Unicode representation but glossed over it because such problems are “overshadowed by the terrible awful even worse problems caused by filenames all being in random unguessable charsets”.

I’m guessing that, by now, most developers on Unix-like systems would be using UTF-8 for filenames – though a decade after these articles were published, there still doesn’t seem to be any good/universal solution to the problem of characters with multiple Unicode representations.

¹ https://dwheeler.com/essays/fixing-unix-linux-filenames.html