Hacker News new | ask | show | jobs
by bluejekyll 3330 days ago
This is a great example of something else about software. As software grows in usage and use cases, it starts bumping up against edge conditions which need to be handled for various reasons.

Cargo now is becoming stronger and more stable because of bugs like this being discovered. All software goes through this growth cycle. It's great to see these things worked out in the various projects that support Rust.

There is another point here though; anytime the question comes up to just rewrite a piece of software, throw out all the technical debt, it's not as straightforward as it seems. Remember, together with that technical debt lies a lot of valuable learnings written into the code. I haven't worked on Windows directly in years, but I never knew that NUL was a reserved word as a file. I would, and probably still will make this mistake in the future.

Which makes me wonder, has anyone written a file name validation crate that guarantees that you're not writing to any reserved words on a filesystem of the host OS? A quick search of crate.io doesn't turn anything up.

5 comments

It also shows how necessary it is to have some sort of deprecation process. Maintaining nonsensical landmine features for compatibility with an operating system released 36 years ago is putting the interests of MS's lazy long-term users ahead of the interests of its current users. Even if MS maintained a policy of only removing functionality after a 10-year deprecation period, this "feature" would have been gone long ago. Transitions must be orderly, but they should still happen.

It's nice that Rust's toolchain is better able to live Windows crazy ecosystem, but that doesn't make Windows any less crazy.

If you think Microsoft supports features long-term out of lazyness then you haven't been paying attention. It's a very deliberate choice that helped them grow their business and keep customers.

Transitions are nice from a development perspective but I can guarantee you'll never hear someone who uses your library happy that they need to rewrite parts of it.

Also Windows doesn't have a monopoly bizarre filenames/features/etc you can find plenty of things in the nix family as well.

Lastly, Rust is one of the few projects I've seen that has phenomenal Windows support. It's something that's really appreciated and is going to help them capture markets that other software won't.

> If you think Microsoft supports features long-term out of lazyness then you haven't been paying attention.

Misreading. GP talked about MS's lazy long term users, "lazy" applies to the users not to Microsoft.

> Also Windows doesn't have a monopoly bizarre filenames/features/etc you can find plenty of things in the nix family as well.

Like what? I'm not aware of special file names in arbitrary directories. Only in known/documented ones like /proc or /dev.

I'd say *nix OSes are too lax in what they allow as anything without a zero byte is valid.

In the order I happen to think of them: Filenames may be straightforward on the filesystem level, but a lot of UNIX programs do weird things with them. Many programs use "-" to mean STDIN or STDOUT as appropriate where it is used. Bash has a somewhat ill-conceived feature where it synthesizes a /dev/tcp/$host/$port filesystem that will write to TCP or UDP sockets. Most people don't know about this, a few people think it's a UNIX feature rather than a bash-ism.

The fact that multiple /s will be normalized to be the same as one sometimes trips up security code or code trying to validate that some particular file isn't used (i.e., checking that the filename doesn't start with /dev or a list of other blacklisted directories will fail if the user passes //dev).

Symlinks! Oh, gosh, symlinks. Were this not a stream-of-consciousness dump they probably should come first. You can do terrible things with symlinks, like upload a tarball or zip file that creates a symlink to an arbitrary location in the system, then use that symlink reference as a directory reference to plop a file down. (Some archivers prevent this, others don't.)

Also, /dev is just a convention, it's possible to place device nodes anywhere you want.

You can also pretty much mount arbitrary things in arbitrary places via bind mounts. Hard links can also cause some fun with code that assumes file systems aren't cyclic. Windows technically has a lot of these features but they're harder to get to and less well known whereas UNIX uses the various links in base Linux installs and they're readily available.

Is there any particular reason not to have something like /dev/tcp as a real filesystem, rather than a pretend game that bash likes to play?
There were several implementations of that idea in the early 1980s. The following paper describes one of them.

More Taste: Less Greed? or Sending UNIX to the Fat Farm[0] describes a V7 derivative that had /dev/deuna, /dev/arp, /dev/ip, and /dev/udp.

[0] http://www.collyer.net/who/geoff/taste.pdf

Actually, the stronger case is that the feature should be removed from bash. While it's hard to point at a specific security guarantee that UNIX makes that bash violates by making TCP available via the psuedo-file system, it is a non-trivial ambient contribution to general insecurity for UNIX systems. (People itching to reply to that sentence, please parse it carefully first; I chose the adjectives quite carefully. In particular, I did not just call UNIX "generally insecure".)
Symlinks are a poor example, IMO. Yes, they need to be carefully handled for security reasons. But they also offer great flexibility that is actually widely used, and that wouldn't be available through other mechanisms.
To paraphrase: Windows NUL is a poor example, IMO. Yes, it needs to be carefully handled for reasons. But it also offers great flexibility that is actually widely used, and that wouldn't be available through other mechanisms.

I rest my case. ;-)

Just look at Mac OS X, which is also from the Unix family. It has the feature of decomposing precomposed characters in file names, so if your software writes a file named "café" (caf\xc3\xa9), and later lists the directory, it will find a file named "café" (cafe\xcc\x81). That tends to confuse software which expects to find a file with the same name after creating it, like for instance git.

For a while, if you were in a team in which some developers were on Linux and others were on Mac OS X, and someone on the Linux side checked in a file named with a diacritic, on the Mac OS X side the file appeared to have been deleted (and a new untracked file with the "same name" appeared). Later git grew special code to work around this misfeature.

And yes, Linux has the "bizarre feature" of being way too permissive. A filename is a sequence of bytes of which only the null byte and the slash are forbidden, and only a single or double dot have special meaning; one can have files named with control characters, and/or with something which is not valid for the current character encoding (LC_CTYPE), leading to pain for languages which insist that a string must be always valid Unicode (this includes Rust).

But yeah, nothing compares to the madness that is forbidding simple names like "nul" or "con" or "aux" (alone or followed by any extension) in every single directory, made worse by the fact that you can create files with these names if you use a baroque escaping syntax (which is not available for every API), confusing every other program which does not carefully do the same.

And let's not forget about the fact that the file you just created might not be readable or writable the next instant, because some other process (usually some sort of "antivirus") decided to open it in a exclusive mode. I've seen several projects add retry loops when opening (or moving, or deleting) a file on Windows, to work around that issue.

> It has the feature of decomposing precomposed characters in file names

I was under the impression that the new APFS stopped trying to understand bytes in filenames at all, thereby switching from 'confusion' to [tableflip] as a policy (which is likely an improvement, but also amuses me on the basis it's nice to know [tableflip] is about the only response anybody has to certain unicode-isms)

(note that rust just requires the built-in string type to be valid Unicode, you are free to manipulate other kinds of strings, which is exactly how the os string problem is solved. Also gives you a chance to explicitly handle the errors.)

    And let's not forget about the fact that the file you just
    created might not be readable or writable the next instant
    because some other process (usually some sort of
    "antivirus") decided to open it in a exclusive mode.
THIS. Spent quite a long time trying to reproduce a Windows-only bug with the old Rails 2 gem unpacker caused by exactly this; the code would create a directory "foo-1.2.3" and then immediately try to write files to it and fail because of an exclusive lock - on an empty directory.
Exclusive mode is useful when used for good reasons - i.e. to get snapshot semantics (no-one else can change this) while reading, or implement atomic changes (no-one else can see the change halfway) when writing.

The problem on Windows is that too many APIs decided that exclusive should be the default mode if none is specified - which is the safer choice in a sense that it gives the most guarantees (and the least surprise) to the caller, but arguably the adverse effects it causes on other apps are more surprising and harmful in the end.

I agree with pain-points that you described.

Each OS has it's set of weird, broken and surprising behavior. Most of it in the name of backwards compatibility. There is a group of people that see one mess bearable while the all others totally brain-dead. There are other groups that have somewhat different opinion.

Everything sucks. Which one sucks less? I pick the one that I know more about.

Note that many OS operations in general require retry loops on POSIX systems.
Well, Windows technically supports files with the reserved names - if you use the right APIs - but they break many programs including Explorer. You could make an analogy to Unix filenames with spaces or newlines, which can be created but don't work properly with some tools. (For spaces, try 'make CFLAGS="-I/path/with spaces/"' - there is no way to escape it or otherwise make it work. Newlines break a lot of stuff.)
IIRC you can `make "CFLAGS=-I/path/with spaces/"`
That doesn't make a difference - regardless of where you put the opening quote, make gets the string "CFLAGS=-I/path/with spaces" as argv[1]. The quotes do help, as otherwise it gets split up into multiple arguments to make.

But actually, I was wrong - GNU make passes strings to execute to the shell, so you can use nested quotes: CFLAGS='"-I/path/with/spaces"'. Not sure why I thought differently. The shell itself doesn't work this way, though: when it splits a variable into multiple arguments, it just splits by spaces rather than doing any fancier processing. So there are issues with shell scripts.

The windows command line client for PostgreSQL used to produce confusing errors on my machine because my development source code directory happened to be called "C:\dev"

What constitutes "bizarre" depends a lot on what your prior assumptions are.

But it isn't just a 10 year old feature no one uses. It seems if you write to the NUL file in any directory it still works the same as writing to /dev/null today. There might be scrips written yesterday that rely on that behavior.
Joel Spolsky famously praised this policy of backward-compatibility at all costs which he called "The Raymond Chen Camp"[1]. Many agreed with him, but I always thought that Microsoft compatibility ideals were too radical to be real wisdom. At some point the list of features you try to keep compatibility with grows large enough that the Raymond Chen Way becomes unmaintainable.

The received wisdom of the 90s is wrong. Most users don't care about compatibility, as Apple's success has clearly shown, and most companies are now out following the Apple road. Large enterprise care about compatibility, and they pay a lot, but this is not a forward-looking market. They'll keep using buying new versions of your software because of the compatibility, but if compatibility is the only story you have to offer, you'll slowly lose that market.

I completely agree with you that Microsoft should have had a strategy for deprecating these features back in the 90s, when they were already old.

In this specific case of outdated filename restrictions, you could start with what they already did: Windows NT 3.5 - Allow accessing all filenames with a special prefix (which they already did). Windows NT 4.0 - Make it easy to migrate to sane filenames by providing an opt-in per-process flag that makes all APIs use them by default. At this point they can easily dogfood and migrate all Microsoft software to the new APIs, so you would be able to delete these pesky files in explorer. Windows 2000 - Make the new API flag default for all versions compiled with the latest version of the Windows SDK. Windows XP - Make the new API default for any app without a special entry in the compatibility database.

Somewhere along the road, batch files (which is the only place where compatibility with the old filenames was necessary) could be easily made compatible by modifying the batch parser to replace redirections to NUL with redirections to \\?\devices\null or something akin. You may see some breakage in scripts which use NUL and CON in non-standard way (e.g. as an argument), but the migration pain won't be huge, and you could still save an old script with a compatibility flag.

Microsoft obviously didn't take that way, and yeah, all the batch files written back in 1981 may still work without hitch, but newer things keep breaking in strange ways.

[1] https://www.joelonsoftware.com/2004/06/13/how-microsoft-lost...

Newer things only break in strange ways because they're broken. So rather than break the old stuff, why not fix the new stuff?? - because after all, approximately the only criticism you can't level at the Windows NUL/PRN/COMx/etc. special names is that they're some kind of surprise that appeared suddenly out of nowhere! It's been this way for a very long time.

(I wonder if part of this is the rage of Unix fans discovering that portable means actually, you know, making an effort... and that there's more too it than just checking it builds on x86 Debian as well as x64 Ubuntu...)

You can't just say it's been that way for a long time so it's acceptable, because the industry (and for that matter the Internet) is getting fresh new people every day. You can't expect them not to be surprised, and you can't just arbitrarily require them to know something they haven't stumbled upon until after it caused problems.
"portable means actually, you know, making an effort"

When I hear portable, I immediately think of the Portable Operating System Interface.

> Most users don't care about compatibility, as Apple's success has clearly shown

Apple is not exactly big in the same markets where MS is big, e.g. enterprise. So while I agree that "most" users don't care, the very few who do care might be important customers for MS.

EDIT: grammar

I can't think of a single enterprise where devs don't use MacBook pros. Sure they exist but I haven't run into it.
I can't think of a single enterprise where devs don't use a Dell supplied by the IT department.

The only MacBooks I've seen at the various meetups I've been to were at 'hip' dev shops.

They exist in large quantities - try every .NET shop for starters.
The government, federal, state, and local.
> Most users don't care about compatibility, as Apple's success has clearly shown,

by having 3% of the desktop market and 10% of the smartphone market?

Apple has 18% of the smartphone market: http://www.idc.com/promo/smartphone-market-share/vendor

And 7% of the PC market:

https://www.google.com/amp/s/amp.businessinsider.com/apple-m...

And greater than 10% in the US.

Even those numbers don't exactly scream users don't care about compatibility.
Apple market cap: ~700B, Microsoft market cap ~500B.
It's almost like they are both successful but for different reasons.
Market cap is a meaningless metric. It tells more about state of mind of general public (greed vs. fear) than about company's well-being
How does having a higher market equal cap users don't care about compatibility?
Market cap is a lottery ticket.

I would assign more meaning to cash hoard:

Microsoft: ~$100 billion. Apple: ~$250 billion.

Microsort has taken a similar approach for adding long filename support to Windows 10[0].

[0] https://blogs.msdn.microsoft.com/jeremykuhne/2016/07/30/net-...

take this post to /r/sysadmin and watch them bring out the pitchforks.
The best demonstration for backwards compatibility: https://www.youtube.com/watch?v=PH1BKPSGcxQ
> I haven't worked on Windows directly in years, but I never knew that NUL was a reserved word as a file.

It's not. It's a reserved word through the MS-DOS file redirection facilities. If you use the newer file API or you use the \\?\[path] convention; the reserved words are not an issue and you can create files named for them.

You have to use both, actually. The Unicode API and the \\?\ path prefix. It also astonishes me sometimes how many applications nowadays still choke on Unicode paths.
> I never knew that NUL was a reserved word as a file. I would, and probably still will make this mistake in the future

While we're here: NUL, COM<n>, LPT<n> and AUX are reserved.

> While we're here: NUL, COM<n>, LPT<n> and AUX are reserved.

Worse: they're reserved with any extension. Have a file in your repository called "aux.rs"? It will cause problems on Windows.

Which happened in Servo already: https://github.com/servo/servo/issues/1226
It's swings and roundabouts. Say I create an app or tool that happily resides in c:\proc\whatever then I turn my attention to creating a Linux version and specify /proc/whatever then ... boom? Sure, it's maybe a convoluted example, but for the creator of this "nul" package they got burned by something that's actually common knowledge in the MS world.

I think you need to be a wee bit pro-active and take a look at your potential deployment targets and try and guard against these types of naming issues. Unix and Linux aren't the only (one true) operating systems in the world.

The right solution, actually, is to use a library that gives you the right path for the thing that you need to do depending on the conventions of the platform. For example QStandardPaths in Qt: http://doc.qt.io/qt-5/qstandardpaths.html

  QString appDataDir = QStandardPaths::writableLocation(QStandardPaths::AppDataLocation);
  // ~/Library/Application Support/<APPNAME> on macOS
  // C:/Users/<USER>/AppData/Roaming/<APPNAME> on Windows
  // ~/.local/share/<APPNAME> on Linux
Still doesn't solve the case where the developer just wants to slap something in the root of the C: drive on windows from the outset (Cygwin I seem to remember defaults to c:\Cygwin for example...again slightly convoluted).

Also those locations are user specific, there's nothing there to support the use-case of an app that's available to all users, or might just be a system service (/daemon).

And CON, as Macha mentions in a sister comment. An idiom I remember from old times in DOS, for quickly writing some contents into a file - equivalent to `cat > myfile.txt` on Linux:

  COPY CON MYFILE.TXT
You can also do

    type con > myfile.txt
IIRC, CON was reserved, too, at one point (which would mean it most likely would be reserved today)?

Also, have fun trying to delete C:\Program Files\Xerox ;-)

I've done con.py on a Linux system a few times for net code in different projects and then realised I couldn't clone it on windows. It comes up infrequently enough that you can forget
I came across the concept of "Chesterton's fence" a while ago here on Hacker News, which I really like: https://en.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fence