Watchman: Faster builds with large source trees | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Watchman: Faster builds with large source trees (facebook.com)
	51 points by patangay 4810 days ago

8 comments

evmar 4810 days ago

If I'm reading that graph right, the red line is at 10kmsec, i.e. 10 seconds?

Incremental build times are near and dear to my heart; I spent a lot of time making the Chrome incremental build fast, resulting in this tool: http://martine.github.io/ninja/ . In developing Ninja I was surprised to discover that Linux stat() with a warm disk cache is very fast -- well under 100ms to stat the ~40k source files Chrome uses in its build (see the "node stat" lines here: https://github.com/martine/ninja/wiki/Timing-Numbers ). At its best point I think we got the one-file-changed build/compile/link cycle of Chrome (a ~70mb C++ binary) to around 5 seconds.

Of course, Facebook's problem is surely very different -- their scale could be many more files, and perhaps the programs their engineers run while developing cause their disk caches to flush more frequently. Just found it interesting to worry about the cost of stats.

groby_b 4809 days ago

And of course, there's OSX. See the 58s cold build stat for ninja/Chrome on OSX. And the fact that OSX's cache gets cold whenever you sneeze.

redcircle 4809 days ago

Ninja is awesome

norswap 4810 days ago

What is the difference with guard (https://github.com/guard/guard), except the "service" aspect?

yid 4810 days ago

Not sure how this is different from inotify+md5+make?

wezfurlong 4810 days ago

Watchman uses inotify under the covers (or kqueue or portfs, depending on the OS) and abstracts the differences away.

For the Facebook www build it is no longer practical to hash every file to see if it changed because there are so many that it is pretty common for the files to have fallen out of the buffer cache. Attempting to hash the files can thus lead to a significant amount of I/O and translates directly to an increased wait time for the user.

In addition, because of the volume of files, it is not feasible for us to statically declare the build dependencies using a traditional Makefile or similar tool; it is crazy to maintain manually and generating the mapping is itself an expensive operation.

We chose to implement this in C because because it gave us tight and deliberate control of the resources and dependencies of the service.

ralph 4810 days ago

I often think it would be useful for a filesystem to provide a digest of a file's content. It's the FS that knows when the file's content changes and the digest is out of date, and it also only needs to recalculate if anything asks. It wouldn't necessarily have to read all the bytes of the file to re-calculate; it may be a hierarchical digest where much of the existing, stored, labour can be re-used.

wezfurlong 4810 days ago

What would be a good start would be portable and reliable snapshots with differencing managed by the kernel. Some filesystems offer this capability but it is not available to us in this particular circumstance.

wmf 4810 days ago

This is cross-platform, which inotify isn't. Also, isn't inotify pretty low-level? It looks like FB has consolidated common inotify helper code into a reusable daemon.

notacoward 4810 days ago

http://people.gnome.org/~veillard/gamin/ has been doing this for ages, with inotify on Linux and kqueue on BSD. It's already packaged and available - indeed, often required - on many distributions. Facebook has done some neat stuff here, but providing a portable file-change-notification library isn't novel.

_mpu 4810 days ago

Not sure how inotify is different from while+stat?

yid 4810 days ago

Your snark is unwarranted. While+stat in a loop is a poll loop in userspace, inotify is a push mechanism in kernel space.

My point was that the additional functionality of this significantly-sized package beyond running inotify+md5+make in a shell script was unclear.

Terretta 4810 days ago

Trivial example use of inotify (or rather, inotifywait) to restart Apache after you edit the Apache conf file or add/edit/remove a site config:

    while inotifywait -e attrib,modify /etc/httpd/conf/httpd.conf -e attrib,modify,create,delete,move -r /etc/httpd/sites-enabled ; do
      /sbin/service httpd graceful && echo "`date -u --rfc-3339=seconds` httpd graceful" >> /etc/httpd/conf/httpd-conf.log
    done

Shows how to monitor a single file and a directory, and do a sequence of commands if events happen. Unlike while stat, this doesn't spam checking the file system for changes, it waits until the kernel notifies that a change happened.

To "daemonize" this, tack on an invocation test, perhaps:

    #!/bin/bash
    if [ "x$1" != "x--" ]; then
    $0 -- 1> /etc/httpd/conf/watchconf.log 2> /etc/httpd/conf/watchconf-err.log &
    exit 0
    fi
    
    while inotifywait -e attrib,modify /etc/httpd/conf/httpd.conf -e attrib,modify,create,delete,move -r /etc/httpd/sites-enabled ; do
      /sbin/service httpd graceful && echo "`date -u --rfc-3339=seconds` httpd graceful" >> /etc/httpd/conf/httpd-conf.log
    done

hildegard 4810 days ago

I was just looking into inotify to reload nginx and various other services on conf changes. I guess I'll go with Watchman so I don't have to write a series of shell scripts.

I considered guard (https://github.com/guard/guard) and Nodemon (https://github.com/remy/nodemon), but Watchman has less dependencies (doesn't require Ruby/Node).

There's also Supervisor (Python) (http://supervisord.org/), but I think that is more process management. I'm not sure if it can do file-watching as simply as Watchman.

Game_Ender 4809 days ago

Not directly applicable but if you are using Salt to manage your servers you set it up to restart a service when certain files change with the watch requisite [1].

[1] - http://docs.saltstack.com/ref/states/requisites.html#require...

rejuvenile 4810 days ago

The tup build system also does this.

thelarry 4810 days ago

Compiling while editing. Intredasting.... I assume there is some crazy dependency algorithm.

kevingadd 4810 days ago

Traditional build systems are all dependency algorithms. The interesting thing here is that they run a service to track file modifications so it's not necessary to stat() the entire hard drive to find out what has changed, so you can do incremental rebuilds very cheaply.

b0b0b0b 4809 days ago

This reminds me of Eclipse's support for native hooks to detect file updates (in case something was changed from outside the eclipse session). Watchman would then be like a deconstructed ide.

livingparadox 4810 days ago

I read their problem statement, describing the features they needed. Seemed like Git could solve their problems, with a few well written hooks.

raylu 4809 days ago

Only if you want to commit on every iteration.