Hacker News new | ask | show | jobs
by josefbacik 3595 days ago
This is just an accident of how some file systems are implemented and isn't actually garunteed. If you did this on xfs you could still end up with the a 0 length symlink if you crashed at just the right time.
2 comments

The article is talking about atomic with respect to processes running at the same time as the operation, not with respect to a system crash.
xfs doesn't guarantee atomic renames in the same directory?

I thought that requirement was from POSIX.

Does xfs not conform to POSIX?

I think what the josefbacik means is that there is no guarantee that the original symlink under its temp name has actually been written to disk. After the `rename()`, there is still no guarantee.

I think this is true in many filesytems, not just `xfs`.

If your atomicity requirement you never want the file to disappear from the POV of an external process, then the OP's method is sufficient. If you want crash-proofing as well, then you will need an fsync() -- preferrably on the tempfile BEFORE the rename().

No, you want to fsync() on the directory, not the "tempfile", and after the operation, not before. Consider:

            d=open("."), unlink("t");
    /* 1 */ symlink("new","t");
    /* 2 */ rename("t", "link");
    /* 3 */ fsync(d);
    /* 4 */ close(d);
Crashing, at (1) nothing has happened yet, (2) we might have "t" or we might not, (3) we might have "t", or not, and we might have "link" pointing to "new" or "old", but we can't have "link" pointing to anything else (or empty), and finally at (4) the change cannot be reverted.

You can insert a second fsync() where you suggest at point (2), but all this will guarantee is that we will have "t" in the directory because the symlink contents are part of the directory they live in. This might be useful for some applications, but the cost of two disk writes is high enough it may be worth redesigning your application.

If you crash at (3) you can -- at least in principle -- have "link" pointing to garbage (most likely an empty file). That is, the dirent points to the new inode, but the actual link-text got lost with the crash.

Now on modern filesystems, a non-huge symlink will be stored in the inode itself and presumably enjoys some sort of atomicity. But there is nothing in the standard about that.

> If you crash at (3) you can -- at least in principle -- have "link" pointing to garbage (most likely an empty file).

No, I don't think you can, bugs notwithstanding. A "link" (§3.130) is what POSIX calls a directory entry.

> But there is nothing in the standard about that.

The "standard" (POSIX) doesn't talk much about crashing, however if mkdir("a") could destroy "b" – even during a system crash (§3.387), then users would complain.

The rename is atomic, the data being in the file is the problem, there is no garuntee unless you fsync.
Sorry, would you mind clarifying? "The data being in the file"?

The way I understand the article's proposal is this:

1. Create new symlink pointing to desired file (assumed to already exist in a stable state).

2. Move new symlink over old symlink.

Symlinks are just special files with a the contents being the link contents. His argument is that it is not atomic when considering a server crash. But I don't think that matters anyway.
Symlinks are not files, they do not have inodes and you cannot open them to fsync their contents.

They exist as directory entries with a small in-directory content only, thus syncing the directory they exist in is sufficient to persist them.

Symlinks do have inodes, and their content (the link information) will be stored in that inode structure, if it fits, not in the directory.

See e.g. https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Symb...

The owning uid/gid has always been stored in the inode rather than the directory entry. Did symlinks traditionally have identical ownership to the containing directory?
You are confusing hardlinks with symlinks. They are similar but not the same. Symlinks in Linux most certainly get their own inode and can contain data blocks.
Per the documentation this seems to be the case. However, I'd be curious to know if the implementation of the symlink syscall does or does not fsync the new file while in kernel space...

The fact that it does or does not should actually be in the documentation IMHO; probably a good enhancement request.

Symlinks are directory entries, not files, so you need to sync the directory that contains the link.

POSIX does (for some strange reason) permit a symbolic link to have an "inode" (well, d_ino), but there is no way to open this "inode", and no UNIX implementations to my knowledge do this.

Symlinks are implemented as files in Linux, we write the path into the file that you are pointing to, so there most certainly is data that must be fsync()'ed if you want it to be persistent.

Edit: If you want to fsync the symlink you can do an open with nofollow iirc (on my phone so I can't check) so you get the actual symlink and not its target.

Yes, but Linux recognises the need to sync the file object when you fsync() the directory that it is in. There is no way to fsync() the symlink directly.
I'm not sure if this is picking nits, but symlinks have their own inode number.. This ought to classify them as a file.

HOWEVER, looking into it it seems that the target may actually be stored in the inode. This mean the symlink, though a file, has no contents and thus requires no fsync. Does this sound about right?

> I'm not sure if this is picking nits, but symlinks have their own inode number..

No they don't.

First of all, POSIX doesn't use the term "inode number" but "file serial number" which is silly wankery since there's no way to access a file using its "file serial number". They might as well say it was "implementation defined".

Secondly, POSIX used to specify[1] that their inode was "unspecified" because UNIX systems don't store symbolic links in separate inodes, but in the directory that contains them. Now, POSIX specifies symbolic links do have a "file serial number"[2], which could be implemented as a separate file block (confusingly Linux calls this an "inode" which has nothing to do with what UNIX called an inode -- a better term might be "virtual inode" but Linux uses that for something else entirely...)

To this end, I think the only sensible interpretation is the original one: UNIX symbolic links don't have inodes, and POSIX symbolic links might as well have a "hash" of the file contents in the "file serial number" since you can't do anything with it anyway.

[1]: http://pubs.opengroup.org/onlinepubs/009695399/functions/rea...

[2]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/re...

> HOWEVER, looking into it it seems that the target may actually be stored in the inode. This mean the symlink, though a file, has no contents and thus requires no fsync. Does this sound about right?

You have to fsync() the directory that contains the symlink.