Hacker News new | ask | show | jobs
by mmarx 1519 days ago
No, Datalog always has a finite (though potentially exponentially large) grounding, since no rule can introduce new values to the database (i.e., it has no “value invention”/“function terms”/“existential quantification”).

Some form of stratification is usually required when you add Negation (which the demo also doesn't seem to have). But even here it's not because the language would become Turing complete, but rather to ensure that it has a consistent semantics. Consider, e.g., a program with two rules: `A :- not B`, `B :- not A`. Unlike Datalog programs, this does not have a unique minimal model, since both `{A}` and `{B}` are minimal models. Stratification is then a syntactic restriction that prevents such situtations.

1 comments

But what about potential loops, as in a cyclic graph. How to make sure they terminate?
For a Datalog program and a specific database, there is always a finite set of facts the program can possibly derive (that is the “finite grounding” from above), obtained in the following way: for every predicate that appears in the “head” of a rule (i.e., any predicate for which facts are potentially derived by the program), instantiate it with all combinations of values that occur in the database or (as constants) in the program (the so-called “active domain”). In general, this will be exponentially many facts (with respect to the size of the program), and polynomially many facts (with respect to the size of the database).

Entailment in Datalog is monotonic in the sense that once a fact has been derived by the program, no further rule application can invalidate this fact (this is no longer true when adding negation, and is indeed what we recover by stratifying).

Taken together, this means that you can always compute all facts derived by a Datalog program on a database by computing the least fixed point under immediate consequences (roughly “apply all rules and add the derived facts”; this approach is called “naive evaluation” in the literature, in practice, more optimised evaluation strategies are useful), and you are guaranteed to obtain this after polynomially many (with respect to the database) or exponentially many (with respect to the program) steps – every step derives at least one fact; once you have reached a state where there are no more immediate consequences, you have derived every fact that you will ever derive and can stop.

very insightful, thanks!