Hacker News new | ask | show | jobs
by brendanyounger 886 days ago
I wish people would stop referring to Datomic as datalog. Datomic is many things, but only the query format (Horn clauses with unification of variables, similar to prolog) has anything to do with datalog.

Real datalog is far more interesting since it implicitly encodes recursion allowing you to chain rules. Rule A derives new facts, which rule B uses to derive new facts, which rules A and C use to derive new facts, and so on. Datomic has a notion of rules which are mostly syntax sugar and do not support this sort of recursive reasoning.

Why is that a big deal? When rules are run automatically, you can build live, reactive systems, not just a database that sits around waiting for you to query it. Hellerstein's work at UC Berkeley (https://dsf.berkeley.edu/papers/sigrec10-declimperative.pdf) explores this in some detail.

2 comments

> Datomic has a notion of rules which are mostly syntax sugar and do not support this sort of recursive reasoning.

> Why is that a big deal? When rules are run automatically, you can build live, reactive systems, not just a database that sits around waiting for you to query it.

There was at least one serious attempt to bring these worlds together: https://github.com/sixthnormal/clj-3df

Sounds cool. What's the complexity of running this kind of recursive reasoning? Reasonable? Can you suggest any tools to not have to implement it ourselves?
Souffle and Cozo mentioned below already implement the whole of "traditional" datalog.

Percival (https://github.com/ekzhang/percival) has some very nice examples showing how you can interactively write and test rules on top of a datalog interpreter.

Bud (http://bloom-lang.net/bud/) is Hellerstein's proof of concept playground. It has bit-rotted in the past few years, but the examples are readable even if you can't easily get it working.

The complexity can be quite good. You can syntactically determine when you've written linear recursion (equivalent to a for loop) vs not. Otherwise, the complexity is what you'd expect from incremental view maintenance in a normal SQL database. Which is to say O(n^k) with k being the number of relations joined, but usually much, much less with appropriate indexes and skew in the data. All the usual tricks concerning data normalization and indexes from databases apply.

RDFox offers a rather impressive sounding Datalog inferencing engine: https://www.oxfordsemantic.tech/rdfox

> We present a novel approach to parallel materialisation (i.e., fixpoint computation) of datalog programs in centralised, main-memory, multi-core RDF systems. Our approach comprises an algorithm that evenly distributes the workload to cores, and an RDF indexing data structure that supports efficient, ‘mostly’ lock-free parallel updates.

> Materialisation is PTIME-complete in data complexity and is thus believed to be inherently sequential. Nevertheless, many practical parallelisation techniques have been developed [...]

There have been several papers and patents describing their approach, e.g. http://www.cs.ox.ac.uk/dan.olteanu/papers/mnpho-aaai14.pdf