| > removing side-effects when possible will always be better than merely restricting them I disagree. Effects are a very natural mental model for a great deal of problems and constraining yourself to purity is both impractical and quickly experiences diminishing returns. Furthermore, if you can intercept effects, you can impose purity upon them. For an extreme example, consider application virtualization and containers such as Docker. By intercepting the system call table, you can create a "pure" filesystem from the view outside the container. At the other extreme, take a look at "extensible effects" and the Eff language, which lets you stub any subset of the effects available down to the individual expression! > If the tasks produce data structures that describe the build process, it's much easier to intercept and prevent or reorder the cleanup process. If you intercept all file IO, you can recover the same data. The only difference is whether or not you know that data upfront. > Functions, especially ones that are Turing-complete, are notoriously opaque. This is true! However, there are a great many build processes that do not know what they depend on or what they will produce until they do some Turing-equivalent work. For example, scanning a C header to find #include statements. Rather than try to shoehorn all data in to a declarative model, we need both 1) fully declarative and 2) the ability to recover a declaration from the trace of an imperative. An example of this trick, employed manually, is the notorious .d Makefiles. The C compiler finds all the dependencies, produces a submake file with the .d extension, then make-restarts recursively using the new .d file as part of the dependency graph. However, it's a very unnatural way to think about the problem and it leads to complex multi-pass build processes that are necessarily slower. Instead, the dependency graph could be produced as a side-effect of simply doing the compilation and that graph could be used as part of a higher-level declarative framework. |
Let's forget about all other considerations and instead consider the simplest possible build system we can conceive. This build system should take a directory structure of source files, and produce a directory structure of output files.
If our sole consideration is simplicity, we might construct a build system like:
So we take every file in the current working directory, read everything into memory, perform some functional transformation that produces a data structure of output files, then write that to disk. This minimises I/O, and gives us a functional data structure to play around with.It's a naive approach, and one made without regard for memory or efficiency, but given that the amount of memory on a modern machine is far larger than the source directory is likely to be, it actually seems feasible.
However, we can also consider optimisations that don't alter the behaviour. For instance, we could only read in files when their contents are accessed. In order to protect against changes, we could check the modification date, and abort if it changes. It's a compromise, but a small one.
We might also conceive of a system where the contents of the file are memory mapped, or held in some temporary file, or any number of clever ways to avoid keeping the file in memory while not breaking the integrity of the data structure.
This is just a toy example, and lacking in many areas like network I/O, but it's easier to start simple and add complexity when necessary, than it is to start from an assumption of complexity and try to work backward to simplicity. This is why I think it's incorrect to start with side-effectful functions, because that means starting from complexity.