Hacker News new | ask | show | jobs
by triska 1297 days ago
The code snippets shown on the page are almost valid Prolog syntax. The only issues that prevent them from being actual Prolog syntax are, as far as I can tell, very minor syntactic issues. For example, != cannot be defined as an infix operator in Prolog because ! is a so-called solo char and therefore != is not admissible as a single token in Prolog. The obvious fix for this concrete issue is to replace every occurrence of != in these snippets with the well-known Prolog predicate dif/2 which holds if and only if its two arguments are different.

For instance, the first example from the page:

    projects_with_vulnerable_log4j(P) :-
      projects(P),
      contains_jar(P, "log4j", Version),
      Version != "2.17.1",
      Version != "2.12.4",
      Version != "2.3.2".
would become:

    projects_with_vulnerable_log4j(P) :-
      projects(P),
      contains_jar(P, "log4j", Version),
      dif(Version, "2.17.1"),
      dif(Version, "2.12.4"),
      dif(Version, "2.3.2").
or, shorter and equivalently, by using full Prolog including higher-order predicates such as maplist/2:

    projects_with_vulnerable_log4j(P) :-
      projects(P),
      contains_jar(P, "log4j", Version),
      maplist(dif(Version), ["2.17.1","2.12.4","2.3.2"]).
This third version could of course be generated automatically from the snippet above. We could also define dif or many other tokens as infix operators, so that we can use operator notation as in the original snippet, all while keeping to standard Prolog syntax.

Analogously for |> and the let-bindings that occur in the sample snippets: Chances are that they can be expressed somehow also with standard Prolog syntax with suitable operator definitions, or with small conforming extensions if absolutely required.

The third example in the README is already perfectly valid Prolog code, and can be parsed and interpreted with every conforming Prolog system:

    contains_jar(P, Name, Version) :-
      contains_jar_directly(P, Name, Version).

    contains_jar(P, Name, Version) :-
      project_depends(P, Q),
      contains_jar(Q, Name, Version).
The main issue here is: This "own Datalog" is extremely close to Prolog syntax, much closer to Prolog than any other Datalog "variant" I have so far seen on HN. The question therefore is: Why not go the small extra step and just use standard Prolog syntax? One advantage of this is that such programs could then be read and reasoned about directly (i.e., without requiring any manual parsing) with every conforming Prolog system. Another advantage is that such programs would also automatically benefit from all improvements in Prolog engines, either via meta-interpretation or by virtue of being valid Prolog programs too. For instance, the first snippet shown in the README, with the small syntactic change I outlined, is already a valid Prolog program and can be run with SICStus Prolog etc.

Using Prolog syntax does not mean that that the programs have to be interpreted in the same way as Prolog would by default. It is also possible to syntactically include constructs that cannot be directly interpreted in Prolog, but require a custom interpreter. The point about syntactic compatibility stands regardless.

1 comments

It was a conscious choice to stick the prolog-like syntax because there are numerous teaching resources (e.g. "foundations of databases") and also academic research papers available that all discuss datalog in this syntax.

Like tmptmpgo I consider datalog something different entirely and worthy of study in its own right; a small kernel (or toy) language that by itself is not sufficient for many day-to-day tasks.

It is enough to cover relational algebra with recursion fixpoints. This gives us a handle on SQL queries. What we want is a well-behaved, predicable mix of programming and those relational operations. We have to end this SQL madness.

IMHO, it is undeniably true that Prolog is one particular way of achieving such a mix (and standardized etc), but it is not the predictable, easy-to-use and well-behaved mix we want. We should be able to isolate syntactic fragments where it is easy to ensure safety properties, at the minimum the datalog fragment. That is why having a fragment correspond as close as possible to textbook Datalog was a design constraint for me.

That said, when we talk implementation, there are of course a lot of optimizations that apply equally to Datalog and Prolog, say the magic set transformation.

> What we want is a well-behaved, predicable mix of programming and those relational operations. We have to end this SQL madness.

These are great goals! Thank you a lot for working on them, and for taking the time to participate here!

> We should be able to isolate syntactic fragments where it is easy to ensure safety properties, at the minimum the datalog fragment.

Again a great goal! Luckily, for every given Prolog program, it can be decided statically whether it is in the usually considered Datalog fragment. Likewise, for every given Prolog program, many things can be statically decided, for example it can be said statically, i.e., by just "looking" at the program without running it, with certainty that impure constructs do not occur in a given Prolog program and that these transformations can be applied without changing the declarative meaning of the program.

I think it is a very good idea to restrict oneself to a fragment of Prolog, especially if you need a lot of leeway to optimize programs, and to reason about them. For example, query optimization is much easier if you know that no side-effects occur in a Prolog program.

The question is whether all this needs any syntax beyond what standard Prolog already provides. Prolog is a very flexible language and supports user-defined operators. If possible, I recommend to keep to this standardized syntax, and to use a sensible fragment of it to facilitate reasoning, at least for as many programs as possible. The availability of existing teaching material and research papers you mentioned is another excellent argument in favor of this approach. It is perfectly OK to only support a proper subset of full Prolog, in fact that is exactly what Datalog does!

Looks like Mangle differs from Prolog when it needs to express aggregation. I am curious what prolog dialect/library you think has good(or best?) aggregation syntax?