Hacker News new | ask | show | jobs
by reaperducer 1037 days ago
It’s only forbidden for Windows users.

I think the only character that Macs don't allow is the colon. AFAIK, everything else is fair game, even emojis.

I just created a project with "ê" in the title. I wonder if I'll be able to share that with my Windows coworkers on Teams.

7 comments

> AFAIK, everything else is fair game, even emojis

FWIW, I generally expect emojis to be more compatible than other symbols, because they have no legacy meanings - ex. ™ has never been a path separator, or indeed anything else.

Main problem with "emojis" is that they live outside the basic multilingual plane, and so make bad utf-16 handling really obvious. This is a blessing in disguise, because it acts as a no-brown-m&ms thing, since it's more unarguably broken than not being able to use cuneiform, or musical symbols.
Yeah, point; I probably should have said something more like "emoji shouldn't have alternative meanings, so they should work if your unicode support is functional (which is a big caveat)." On the bright side, yeah, emoji have been great at pushing things to handle unicode nicely and act as a... I dunno, natural fuzzing case?
The problems show up when that name gets stored in a VARCHAR column somewhere...
™ is not an emoji. It's part of the "Letterlike Symbols" block.
Oops, good point. I think I'm going to leave it as-is on account of not knowing of any actual emoji that HN allows, but yes that is technically incorrect on my part.
I think the confusion comes from the fact that some platforms render it as a drawing as if it was an emoji. I never understood why...
Speaking only for myself, the confusion comes purely from the fact that I think of all text as "ASCII" or "Unicode" (...or "something else that nobody should use in the modern era"), and I don't really distinguish within "valid unicode that isn't ascii".
™ is part of extended ASCII :)
> I think the only character that Macs don't allow is the colon.

Colon is allowed in the filesystem; it's displayed as a forward slash in the UI.

(There's historical reasons they do this: Classic Mac OS used colon as a path separator.)

And it was merely not recommended to use a fullstop at the start of a filename. Device drivers —perhaps just for storage devices — were usually named this way (and also marked invisible). There was an alleged chance the OS would try to load your non-driver file as a driver for the device it resided on. Though I was unable to make it happen.
Yes. NTFS has no problem with non-latin letters and emojis in file names.
NTFS itself has not that many forbidden characters (though ':' is one of them, it denotes alternate stream).

What people think about those CON, PRN, AUX, NUL etc are not filesystem limitation.

And while we are here - nor backslash, nor forward slash are used in NTFS. It can care less about what char do you use for a directory separator. Just be sure to update your APIs.

NTFS on non-Windows systems aside, I wonder whether there are any "pure NT" environments you can access in Windows where you can create and use these folders.
You can do some stuff under Cygwin that's bothersome under explorer.exe, like delete files with really long pathnames.
You just need to come in below whatever layer notices them. I'm sure they've been used to smuggle viruses in, so the layer that blocks them may get lower and lower.
I had to test it, but this is true. I just renamed my TODO.txt to [sobbing emoji].txt
Linux treats filenames as bytes and only disallows the ascii slash and null. All other bytes are fair game.
> [...] and only disallows the ascii slash and null. All other bytes are fair game.

There's also the special treatment of "", ".", and ".." (that is, a file or directory name consisting entirely of zero, one, or two dots), and the convention that a name starting with a dot is hidden.

> and the convention that a name starting with a dot is hidden.

It is a UNIX shell implementation level convention. The file system and the kernel don't care, and a shell is not obliged to honour the convention.

Yes, and it was an absolutely horrible mistake in the other direction. If I give you a path “./something/example.txt” it could be a file called example.txt in a folder called something or it could be a single file called “something/example.txt”.
> disallows the ascii slash
It looks like non-utf8 is not allowed: https://superuser.com/questions/204287/what-characters-are-f...

Maybe it goes without saying, but for completeness, / is not allowed in filenames, and neither is null terminator (0x00 or \0) on mac either.

On Linux, only / and null terminator are banned from filenames.

Sadly, you are not correct about slashes. They are allowed in filenames for Macs https://alexwlchan.net/2021/slashes/
Thanks for the correction! It seems more complicated after looking more into it.

According to https://en.m.wikipedia.org/wiki/Filename , reserved characters are:

HFS:

> :

HFS+:

> : on disk, in classic Mac OS, and at the Carbon layer in macOS; / at the Unix layer in macOS

APFS:

> In the Finder, filenames containing / can be created, but / is stored as a colon (:) in the filesystem, and is shown as such on the command line. Filenames containing : created from the command line are shown with / instead of : in the Finder, so that it is impossible to create a file that the Finder shows as having a : in its filename.

TIL about a 'carbon layer' and 'POSIX layer'

It doesn’t make sense tbh, it just causes confusion when someone is using terminal. Slashes in file names are forbidden everywhere except Mac, it needs to be changed in order to send it anywhere or use in some apps. But I think colon is used much more in names. I don’t get why did they do that.
Based on this logic, do we ban everything that is banned somewhere? Surely we can aim higher than the lowest common denominator.
Given the preponderance of 3 major operating systems, I'd think it's sensible for user-level applications to disallow creation of filenames that would cause problems on any of them. Except arguably that could even include using spaces or periods... obviously in an ideal world such restrictions wouldn't exist, but I'm not sure how to realistically push for such a world. E.g. my suggestion would be to reserve non-printable characters (below Ascii 32) for use as separators/delimiters in as many contexts where that's workable. Obviously some sort of convention would then need to exist as to how they were displayed and typed in, and I very much doubt I'll ever see it happen, but I'm sure it would solve a lot of mis-parsing bugs that show up with frustrating regularity.
I suppose I’m actually criticising Microsoft and the backwards compatibility that is now dictating practices due to long gone limitations.

It is painful being forced up update things due to software changing underneath you, but there must be a middle road.

I see your point, but forbidding slashes, the weird Windows reserved names, maintaining case insensitivity (also Windows), and forbidding colons doesn't seem like the craziest restrictions on file names. It can definitely get out of hand, though, if you start excluding too much stuff. IIRC, Azure doesn't even allow slashes in storage account names which greatly limits naming schemes and goes too far in my opinion.
Slashes are used in paths, so most programs that aren’t Mac-exclusive would use them to build a path to file. To make it work properly in cross-platform programs you’d need to write platform-specific code to handle that. It just adds complexity and possible errors. Even system terminal doesn’t display it as slash and it doesn’t work if you write slashes.

For users of other platforms (at least 90% of desktop market) it would just display as slashes. Just not implementing this workaround would make it predictable when moving and using files.

“:”? Really? It’s not 2001 anymore.
I guess you're criticizing Microsoft's NTFS, since it cannot use colons, either.

CON:? Really? It's not 1974 anymore.

People in glass houses shouldn't throw stones.

? No it is not 1974, but not everything is better.

Edit: argh, HN deleted the emoticon. Ironic.