Hacker News new | ask | show | jobs
by tmptmpgo 1297 days ago
Because syntactic doesn’t mean semantic subset. Datalog uses bottom-up evaluation by default while Prolog uses top-down. As such they are computationally very different.
1 comments

Yes well, as you mention, that is only the default execution strategy, and nothing prevents us from using other execution strategies for either language. In fact, the main advantage of using a declarative language such as Prolog or Datalog is precisely that it can be readily interpreted with different execution strategies, and indeed many Prolog implementations already provide alternative execution strategies, the most common of which is currently SLG resolution, also known as tabling. Alternative execution strategies are applicable as long as you keep to the pure monotonic core of the language.

The fact that the default execution strategies of Prolog and Datalog differ is not a valid argument for or against using a slightly different syntactic formalism. If, as seems plausible, an alternative execution strategy of Prolog can provide the same semantics as Mangle and other Datalog "variants", then it may be worth considering doing that instead of devising a different syntactic formalism, especially if the alternative syntactic formalism is already so close to valid Prolog syntax as it is in this concrete case.

Yes, I’m well aware of tabling. But high performance Datalog implementations will fundamentally be very different than the usual Prolog implementations. But sure, it isn’t impossible. Just long a ways from ISO Prolog.

Consider Soufflé, designed for large scale program analysis. Or LogiQL, designed for efficient incremental evaluation.

You also failed to disclose that you are not exactly unbiased individual here.

> Consider Soufflé, designed for large scale program analysis. Or LogiQL, designed for efficient incremental evaluation.

I have now looked up these systems: They are far removed from standard Prolog syntax, much farther than the system outlined in the posted article, which uses almost Prolog syntax to such an extreme extent that one of the given examples is already valid Prolog syntax and even a valid Prolog program without any changes. Therefore, the question I posted above ("If you are that close to using Prolog syntax, ...") does not apply to the same extent to these systems.

> You also failed to disclose that you are not exactly unbiased individual here.

Personally, I believe I have a solid grasp on the benefits of using standard Prolog syntax and the disadvantages of so slightly deviating from it, and I also believe that the question I asked is justified and what I stated in this thread is true.

If you see a need for any disclosure beyond what I stated above, please add it and what you believe to be the nature of my bias as your reply to this post for everyone to see. Thank you a lot!

Datalog programs guarantee termination while Prolog programs do not.
Yes, the subset of Prolog that is called Datalog can be interpreted in such a way that termination is guaranteed, and that is a nice and valuable property of this specific subset, i.e., the functor-free subset, of Prolog.

This is a good example to illustrate why it can be useful to interpret specific Prolog programs and fragments in specific ways, which may also differ from the default execution strategy.

As a specific instance of such an execution strategy, we can again consider SLG resolution: If we interpret any pure Prolog program that does not use compound terms with SLG resolution, then every query we post terminates universally.

I don’t understand what you mean. Okay, you can create anything you want on top of Prolog. Is your point that authors of Mangle should have created a Prolog instead of their own Datalog?

Or is more “why don’t they just use Prolog”.

The code snippets shown on the page are almost valid Prolog syntax. The only issues that prevent them from being actual Prolog syntax are, as far as I can tell, very minor syntactic issues. For example, != cannot be defined as an infix operator in Prolog because ! is a so-called solo char and therefore != is not admissible as a single token in Prolog. The obvious fix for this concrete issue is to replace every occurrence of != in these snippets with the well-known Prolog predicate dif/2 which holds if and only if its two arguments are different.

For instance, the first example from the page:

    projects_with_vulnerable_log4j(P) :-
      projects(P),
      contains_jar(P, "log4j", Version),
      Version != "2.17.1",
      Version != "2.12.4",
      Version != "2.3.2".
would become:

    projects_with_vulnerable_log4j(P) :-
      projects(P),
      contains_jar(P, "log4j", Version),
      dif(Version, "2.17.1"),
      dif(Version, "2.12.4"),
      dif(Version, "2.3.2").
or, shorter and equivalently, by using full Prolog including higher-order predicates such as maplist/2:

    projects_with_vulnerable_log4j(P) :-
      projects(P),
      contains_jar(P, "log4j", Version),
      maplist(dif(Version), ["2.17.1","2.12.4","2.3.2"]).
This third version could of course be generated automatically from the snippet above. We could also define dif or many other tokens as infix operators, so that we can use operator notation as in the original snippet, all while keeping to standard Prolog syntax.

Analogously for |> and the let-bindings that occur in the sample snippets: Chances are that they can be expressed somehow also with standard Prolog syntax with suitable operator definitions, or with small conforming extensions if absolutely required.

The third example in the README is already perfectly valid Prolog code, and can be parsed and interpreted with every conforming Prolog system:

    contains_jar(P, Name, Version) :-
      contains_jar_directly(P, Name, Version).

    contains_jar(P, Name, Version) :-
      project_depends(P, Q),
      contains_jar(Q, Name, Version).
The main issue here is: This "own Datalog" is extremely close to Prolog syntax, much closer to Prolog than any other Datalog "variant" I have so far seen on HN. The question therefore is: Why not go the small extra step and just use standard Prolog syntax? One advantage of this is that such programs could then be read and reasoned about directly (i.e., without requiring any manual parsing) with every conforming Prolog system. Another advantage is that such programs would also automatically benefit from all improvements in Prolog engines, either via meta-interpretation or by virtue of being valid Prolog programs too. For instance, the first snippet shown in the README, with the small syntactic change I outlined, is already a valid Prolog program and can be run with SICStus Prolog etc.

Using Prolog syntax does not mean that that the programs have to be interpreted in the same way as Prolog would by default. It is also possible to syntactically include constructs that cannot be directly interpreted in Prolog, but require a custom interpreter. The point about syntactic compatibility stands regardless.

It was a conscious choice to stick the prolog-like syntax because there are numerous teaching resources (e.g. "foundations of databases") and also academic research papers available that all discuss datalog in this syntax.

Like tmptmpgo I consider datalog something different entirely and worthy of study in its own right; a small kernel (or toy) language that by itself is not sufficient for many day-to-day tasks.

It is enough to cover relational algebra with recursion fixpoints. This gives us a handle on SQL queries. What we want is a well-behaved, predicable mix of programming and those relational operations. We have to end this SQL madness.

IMHO, it is undeniably true that Prolog is one particular way of achieving such a mix (and standardized etc), but it is not the predictable, easy-to-use and well-behaved mix we want. We should be able to isolate syntactic fragments where it is easy to ensure safety properties, at the minimum the datalog fragment. That is why having a fragment correspond as close as possible to textbook Datalog was a design constraint for me.

That said, when we talk implementation, there are of course a lot of optimizations that apply equally to Datalog and Prolog, say the magic set transformation.

>pure monotonic core of the language

What does that mean? Monotonic describes a function, or a sequence, that only every increases or decreases, never a combination of both. What does an "always decreasing" or "always increasing" language core look like? This may make sense for concatenative languages.

In logic programming monotonic means that adding a fact or rule does not remove an answer.

Like SQL has monotonic queries where adding entries to a relation never gives you fewer answers.

More formally: Database1 subset Database2 implies Query(Database1) subset Query(Database2) for all databases

Which is exactly the definition of monotonicity (x =< y implies f(x) =< f(y)) applied to sets and functions on sets.

I guess that makes sense. I wonder though what other adjectives might apply to a "language core" other than "monotonic"? Why did you pick that one? Also are there any language are are monotonically decreasing? This is less useless than it sounds, since this is an elegant way to model decay, and decay is hard to model.