Hacker News new | ask | show | jobs
by Dwedit 35 days ago
Windows allows unmatched surrogate pairs in filenames, invalid for UTF-16. Likewise, Linux allows invalid UTF-8 byte sequences in filenames.

Because invalid UTF-16 strings could show up in places within Windows, someone made a UTF-8 variant called "WTF-8", which allows unmatched surrogate pairs to survive a round trip.

1 comments

Indeed, Linux allows anything but "/" and "\0" in filenames. Those days its reasonable to refuse utf8 filenames. But one must always validate first!
> Indeed, Linux allows anything but "/" and "\0" in filenames.

For what it’s worth, NT allows any 16-bit quantity but L'\\' (0x005C) in filenames (even nulls); it’s the Win32 layer on top of it that imposes all the other weird restrictions and mappings.

The NT Object Namespace itself indeed has no restrictions on filename characters except for "\", but once you reach a real filesystem like NTFS or FAT, the forbidden characters continue to be blocked.

https://projectzero.google/2016/02/the-definitive-guide-on-w...