| HN Mirror

By default, find separates by newline; this is human-friendly, but breaks if an attacker/script/... puts a newline in the filename.

The UNIX filesystem, qua filesystem, doesn't have a character set, just NUL-terminated strings. On the plus side, it's simple to handle, and means that retrofitting UTF-8 or another encoding is pretty easy. On the downside, two bytestrings that Unicode-canonicalize to the same value may name different files, which is surprising for humans.

It's notable that many of early UNIX' competitors were much more full-fledged systems, featuring full-fledged record-oriented files and typed data instead of UNIX' bytestrings-everywhere approach.