Hacker News new | ask | show | jobs
by theiebrjfb 479 days ago
Yet another reason to use Linux everywhere. It is 2025 and Windows (and probably Mac) users have to deal with weird Unicode filesystem issues. Good luck puting Chinese characters or emoticons into filenames.

Ext4 filename has maximal length 255 characters. That is the only legacy limit you have to deal with as a Linux user. And even that can be avoided by using more modern filesystems.

And we get filesystem level snapshots etc...

2 comments

You have the same, if not worse, issue on Linux with filenames that aren’t valid UTF-8 sequences. Not to mention that on Linux switching the locale may change the interpretation of filenames as characters, which isn’t the case with NTFS.
> Not to mention that on Linux switching the locale may change the interpretation of filenames as characters, which isn’t the case with NTFS.

If you change the locale to an uninstalled one, then yes. But if the locale is installed, then I don't see a problem.

echo $LANG

# output: en_US.UTF-8

touch fusée.txt

LANG=fr_FR.UTF-8 ls

# output: 'fus'$'\303\251''e.txt'

sudo locale-gen fr_FR.UTF-8

sudo update-locale

LANG=fr_FR.UTF-8 ls

# output: fusée.txt

Are you maybe using non-UTF-8 locale?

Yes, I mean locales like fr_FR.ISO-8859-15, ja_JP.SJIS or zh_CN.GBK.

While these probably aren’t used much anymore, it still means that your filenames can break just by setting an environment variable. Or issues like here: https://news.ycombinator.com/item?id=16992546

I see two points here. First, you did not read the article and did not see the footnote that these are valid in Linux as well.

Second, your comment shows you are lacking the knowledge on Linux as well. In Linux, as I have written in the foot note, accepts anything but 0x00 (null) and 0x2F (“/”). Other than that, all characters are valid paths. If you consider these a problem, I'd like to remind that the 2048 surrogate pairs is a really small subset of unrenderable combinations allowed in Linux.

Anyone are free to have their opinions but at least, before making bold claims, please do your due diligence.

> In Linux, as I have written in the foot note, accepts anything but 0x00 (null) and 0x2F (“/”)

POSIX 2024 encourages (but doesn’t require) implementations to disallow newline in file names, returning EILSEQ if you try to create a new file or directory with a name containing a newline. Thus far Linux hasn’t adopted that recommendation, but I personally hope it does some day.

For backward compatibility, it would have to be a mount option. It could be done at VFS level so it applies to all filesystems.

Personally I would go even further and introduce a “require_sane_filenames” mount option, which would block you (at the VFS layer) from creating any file name containing invalid UTF-8 (including overlong sequences and UTF-8 encoded surrogates), C0 controls or (UTF-8 encoded) C1 controls.

Also I think it would be great if filesystems had a superblock bit that declared they only supported “sane filenames”. Then even accessing such a file would error because it would be a sign of filesystem corruption.

This I did not know. I know that ZFS has "utf8only" option, but not sure about others.