Hacker News new | ask | show | jobs
by JonnieCache 5486 days ago
I think you're broadly right, but you're missing the key point: apple are adding a lot more than a lowercase i to FOSS innovations: they're adding usable interfaces, polish, and the shiny lowercase i.

Time Machine is a lot more than branded rsync. It's usable branded rsync. And the usability is what makes it a "game-changer" in the consumer market where rsync and its algorithmic innovation isn't.

In many ways, it's the same thing that canonical is doing with ubuntu, except they have sweet sweet luxury hardware money backing them up compared to canonical's comparatively meagre support contract money, or whatever else it is that they do for cash.

Oh and let us not forget: isn't it a key part of the point of OSS is that it's OK, nay even encouraged for people to "steal" the ideas? I thought the concept of stealing an idea, especially a software idea, was supposed to be meaningless? Should the Australian National University have patented rsync back in '96?

1 comments

Using the idea is one thing, but taking credit for it is another. I think that's where people get hung up. Especially for non-techies that have never heard of (e.g.) rsync, Apple 'invented' versioned backups. This pisses people off that know about rsync because it's giving Apple more credit than they deserve. Apple didn't draw the idea out of thin air and develop it from scratch. They took an existing thing and made it better.
Almost nothing in technology is drawn out of thin air and developed from scratch. If you tried to give credit for every single stepping stone you used to develop anything, all you'd do is confuse the end users, and that's the whole point of what Apple is trying to accomplish: simplify for the user.

Honestly, if I developed rsync, the fact that Apple was using it to power Time Machine would be quite an acknowledgement to me.

I don't imagine that the people with "invented rsync" on their resumés are exactly struggling for income either.

Being able to tell people about your achievements is a lot better than having everyone pre-aware of them anyway IMO.

(Also it should be pointed out that time machine doesn't actually use rsync at all, or anything like it. It uses a totally different and much simpler system, seemingly built on their own FSEvents API and a load of hardlinks.)

And in particular, hardlinks to directories. That is why it is so fast (and also why it only works on HFS).
Why are hard-links to directories that much faster? It seems like creating a bunch of actual directories with hard-linked files in them is a trivial difference. They still need to scan the files to figure out what has changed, and that's where the really time sink is, no?
Well, consider the case when nothing has changed. All they do is hard link the top level directory and they are done.

  2011-06-12 -> 2011-06-13
On my Linux machine when I do rsync backups it has to create a new inode for each file in the new directory whether or not it has changed. Even on a moderately sized directory tree this can take an hour or more (you can benchmark yourself with "cp -al". Try copying your whole disk just for fun). Time machine completes a cycle in 10 minutes or so and that includes the new data copying.

For small amounts of files I agree with you that the hard linked directories would make a trivial difference. Keep in mind that by default Time Machine backs up /usr, /Applications and /System along with your user data which is quite a lot of files in the end.

That's all just creating the new backup's directory tree. The new data copying itself is fast because Time Machine uses the FSEvents API (which is like inotify but more suitable for backing up) so that it knows exactly what it has to back up beforehand (no scanning). If your computer crashes or you switch disks or otherwise screw up your FSEvents DB then it does have to scan the whole disk looking for changes and that can take hours.

So technically it is fast because it uses FSEvents and also because it can hard link directories. If either of those things are missing then the process would be significantly slower.

No. Time Machine uses the FSEvents API so that it only needs to scan directories where files have changed since the last backup, and branches of the filesystem hierarchy whose contents haven't changed can be referenced en masse with a single directory hard link to the previous backup; it's much faster than scanning or re-linking ~1M files and folders, and much more space-efficient (especially the way that hard links are implemented on HFS+).