|
Hi, author here. Sorry about the confusion - this blog post's intention was to give a more programming-language-themed introduction to the project (discussion on r/programminglanguages is here: https://www.reddit.com/r/ProgrammingLanguages/comments/12im9...). As such, it doesn't really talk very much about what you'd actually use this for - tracking computational experiments (for example). As for the part you highlighted, I was looking for a two-sentence summary that conveys the main features of the project... well, I guess I succeeded with the two-sentence part! :) Let me try to unpack this in a hopefully saner format: - the whole thing is based on memoization. You put a memoization decorator on all the functions whose results you want to persist/reuse, and you compose entire programs by calling such functions on the outputs of other functions. In between calls, you can also mix in data structure operations - so say a function returns a list, you can call another memoized function on just an element of this list. - as such programs execute, some metadata is passed around that links together the inputs and outputs of each call, and the elements of each data structure. This dynamically builds a computational graph of the program behind the scenes. - why would you need this graph? One reason is that there is a principled way to extract a SQL query from this graph that pattern-matches to all analogous computational graphs that have already been memoized. This gives you a flexible and natural interface to ask the queries you usually ask of your programs - "How do these outputs depend on these inputs across all the experiments of this kind?" - without writing extra code. - since memoized results can go stale when you change your code, there is also a versioning system that tracks the dependencies that each memoized call accessed, and alerts you when a dependency changes. Tracking dependencies dynamically on a per-call basis, rather than statically on a per-function basis, gives you more opportunity to reuse computation automatically - for example, a function can have two branches that depend on different dependencies. If only one of the dependencies changes, you won't recompute the calls that used the other dependency. The "content-versioned" part refers to how the system recognizes which version a function is at: by looking at its code (content), instead of by you explicitly providing it with some arbitrary version name/number. This means that e.g. you can "go back in time" w.r.t a given function(s) by restoring the old code, and the storage will recognize that it's back in this world. I hope that helps clarify things, and thanks a lot for bringing this up. |