ClojureScript Builds, Rebooted

Boot looks interesting, but replacing an immutable data structure with side-effectful functions feels like a step in the wrong direction.

The main reason why build tools exist is to perform transformations on a file set. The vast majority of the things build tools do that are useful are side effects.

These "side-effectful" functions are basically transducers; the reduction is performed on the file set instead of a sequence. They are mostly stateful transducers, but this does not fly in the face of functional programming. The opposite, in fact! They facilitate separation of concerns in a way that an immutable but global configuration map cannot.

It's true that you need to interact with the filesystem eventually, but the longer that can be deferred, the more complexity can be avoided.

Ideally I'd want to construct a functional data structure that describes my build process, and at the end pass it to a side-effectful function to produce a change to the filesystem. Boot appears to be side-effectful from the get-go, but perhaps I'm mistaken about how it operates?

Sorry, I should have mentioned that while boot, like any JVM build tool, does begin and end with the class path, we have spent an enormous amount of time experimenting with ways to mitigate the effects. We ended up with a system that provides many of the benefits of immutability while still living in the real world where files actually exist.

Here are some of the things boot provides:

1. We have "pods", which are separate Clojure runtimes in isolated class loaders in which you can evaluate expressions. The actual building occurs in these things. They are lexically scoped and can have a different class path than the main Clojure runtime where your build pipeline runs.

2. Files emitted during the course of the build are created in temp dirs managed by boot. There are a few different kinds of these temp dirs, one of which is lexically scoped. We also have temp dirs that are effectively immutable from a given task's point of view (we use a copy-on-write scheme to achieve this).

3. We make liberal use of hard links and directory syncing to emulate immutability wherever we can. Boot provides a kind of structural sharing with these hard links that really makes the pain of dealing with files go away.

4. We put a great deal of thought into how artifacts flow through the build pipeline, and how tasks that don't know anything about each other can cooperate to work on these files.

This is the most interesting part of boot for me, and I'll be making a complete writeup about it soon.

It sounds like Boot has had a lot of thought put into it, and I certainly welcome the idea of effectively immutable directories.

But it also looks like Boot dives head first into I/O, when I'd prefer a build tool that is a little more circumspect about complexity. While I welcome competition to Leiningen, and I'll certainly keep an eye on Boot, my initial impression is that it's heading in the opposite direction to where I'd want a build tool to go.

I'm of the mind that build tools aren't going to get better unless we forcibly insert instrumentation between our build tasks and the stateful resources upon which they depend. To that end, I'd like to hear more about both your Clojure runtime isolation and filesystem isolation mechanisms.

All JVM build tools are side-effectful from the get-go, and they really have to be. Consider the :dependencies and :source-paths keys in a Leiningen project.clj. The purpose of these is to manipulate the mutable class path. To have a JVM build tool that doesn't revolve around the class path will require a complete reinvention of the JVM ecosystem and all of the existing tooling (like in our demo we use the Google Closure compiler, which mutates all kinds of things–that would have to go), which is, I'm sure, never going to happen.

You need to eventually be side-effectful, but that doesn't mean you need to start side-effectful. The :dependencies in a Leiningen project map are just a data structure until they're passed to eval-in-project, which happens at the end of a chain functional operations.

One of the core ideas of Clojure is that we should try to favour simple solutions over complex ones. Side-effectful functions are the some of the most complex tools we have, and while they are necessary eventually, it would be nice to have the majority of the code-base work with simple data structures, and push out the complexity of I/O to the edges of the application.

Side-effects aren't necessarily bad, it's composability that is good. Purity is a composable property, but there are plenty of composable side-effects.

For example, let's ignore all other effects besides file IO (especially ignoring internet IO). Further, let's assume that our only IO operations are `(read path) => data` and `(write path data) => nil`. Let's also assume that both of these operations are atomic (ie you can't perceive a half-written file). If a build task attempts to read a file that doesn't exist, that task pauses itself. When a file is written, any task waiting on the file are resumed. If you re-write a file, it has to exactly match the already written file, or the build fails. To kick-off a build, you wait on one or more files and then start one or more tasks.

Viewed this way, the file-system is a monotonic logic variable. Yes, the programming model is effectful, but there is a composable property: build repeatability. Just as you can compose arbitrary pure functions and get a pure function out, you can take any two arbitrary graphs of these constrained IO build tasks, compose them together, and the resulting larger graph will also be a repeatable build.

dustingetz 4242 days ago

its not at all clear to me that you are correct.

I'm glad that people are putting thought into the cljs build process though, to this day it is still a particularly un-fun part of clojurescript.

I think that the immutable data structure of lein is largely superficial. Once the final project map is built, all bets are off for the effectful behavior of downstream tasks/plugins. The only clear benefit from it is that you can capture a "complete" configuration with `lein pprint`. However, it does introduce significant cost in terms of the declarative configuration wack-a-mole, where you never quite know what magic ^:replace or :special-key to include to get what you want.

If you're not going to attempt to fix this superficiality by pushing immutable data deeper in to the system, then it makes perfect sense to discard it completely.

michaniskin 4241 days ago

Totally agree.

Also, in boot we don't want to obtain a complete configuration before tasks run, because tasks can participate in the process. In boot a task can add dependencies or call other tasks, etc. This is why we need only one abstraction (tasks).

Just like in a Clojure program you don't have a complete configuration of values in variables before the program runs, because it's the program that creates those values. Boot figures so much out on its own that `boot pprint` isn't even a thing you would want.

Consider the hoops that needed to be jumped through to get the Maven wagons system implemented in Leiningen. In boot it's a non-issue–the environment is dynamic so you can just install wagon deps and then in the next expression install the deps and repositories that depend on that wagon. We didn't need to make any changes to boot to accommodate them.

I still don't understand what the "task abstraction" is or what it provides. It seems to me that it's simply a Clojure function with a corresponding command line interface. Is that fair? Does it do something else too?

If it's just a function, I don't know why command-line support is valuable. For interactive use, the Clojure REPL is just fine. For automated use, you only need a single shell utility like perl or awk to evaluate an expression or run a script.

michaniskin 4241 days ago

Yes, in the post I didn't get too technical with the treatment of tasks; I'll elaborate a little here. First, the command line thing.

You're right that the command line isn't strictly required to use boot, you can do everything at the REPL or perl/awk etc., as you pointed out. But for me it's really useful just ergonomically to be able to use command line arguments to configure ad-hoc builds because they can be very concise. Just like I probably wouldn't be super excited to use a Clojure shell instead of Bash, because Lisp is usually more verbose for the kinds of things I do on the command line. Consider:

    $ boot cljs -usO none

    (boot (cljs :unified true :source-map true :optimizations :none))

the command line version is just nicer for that. When it's time to automate is when you'd put that in your build.boot and make it a new task.

This brings us to tasks. We used to describe them as "middleware factories" but Rich has provided us with a way cooler name: stateful transducers. The build process can be imagined as a transducer stack applied to a file set instead of to an async channel or sequential thing. The principal value the task abstraction provides is their process-building power.

A typical task definition looks like this:

    (deftask foo
      "This task does foo."
      [...]                  ; kwargs/cli-opts
      (let [state ...]       ; local state
        (fn [continue]       ; middleware
          (fn [event]        ; handler
            ...              ; build something, do work
            (continue event) ; call continuation
            ))))

Please ignore "event", it's there for historical reasons. But like transducers, we have powerful ways to build processes from tasks that don't need to know anything about each other now. For example:

    (deftask bar
      "This task does bar."
      [...]
      (comp
        (fn [continue]
          (fn [event]
            ...
            (continue event)))
        (foo :bar "baz" :baf "quux")
        (omg :hello "world")))

A key property of transducers is that they can also perform process control flow duties. The boot `watch` task, for instance, is a totally general-purpose way to to incremental-anything in boot. The `cljs` task doesn't have a file watcher in it, none of the other tasks do. They don't need it.

Another example is the `cljs-repl` task, which emits ClojureScript code when you start the CLJS REPL. This requires recompiling the JS file and reloading the client. This all happens automatically because the cljs-repl task can call its continuation whenever it likes, so it does that when you start the REPL. This means that your webapp code doesn't contain any REPL connecting code, so you don't have to think about removing it for production builds etc. The REPL connecting code is in there when you use the task, and not when you don't. Very clean.

Another interesting property of tasks is that they accept only keyword arguments. They do not take positional parameters. This means that partial application of tasks is idempotent, and that last-setting wins. For instance, given a function f that takes no positional parameters, we have:

    (-> f (partial :foo "bar")) ==
    (-> f (partial :foo "bar") (partial :foo "bar"))

and

    (-> f (partial :foo "bar")) ==
    (-> f (partial :foo "baz") (partial :foo "bar")).

This is pretty interesting because it gives us a nice way to manage global preferences. We have a macro called task-options! which can be used to globally apply options to tasks:

    (task-options!
      foo [:bar "baz"
           :baf "quux"]
      omg [:hello "world"])

This macro actually does some currying and alter-var-root, replacing the value of the task var (deftask defines a var, of course) with a curried version. A cool thing about this is that the last-setting-wins property means that you can override these settings on the command line or in the REPL:

    (boot (foo :bar "not-baz") (omg))

which would override the :bar option, but not the others.

This is probably long enough, hahaha! I'll hand the mic back to you now :)

> it's really useful just ergonomically to be able to use command line arguments to configure ad-hoc builds

It's largely also what contributes to "works for me" build environments... It's better to have a just one way to do it interface and discourage excessive tinkering with parameters. The more parameters, the more likely for your dev env to be unstable across individual checkouts or developers. I know it's idealistic, but I think we should strive for zero-arg builds, which oddly means not making it easier to configure them.

I'll have to think on all the other stuff you wrote, since it's not totally clear to me yet. I may ping you again after I noodle a bit.

lynndylanhurley 4242 days ago

How will this work with tools like figwheel [1] and austin [2]?

1. https://github.com/bhauman/lein-figwheel

2. https://github.com/cemerick/austin

The post was a demonstration of exactly that: cljs incremental builds with live-reload and cljs browser repl.