|
Yeah, there are a ton of substantively different approaches to modern Datalogs, targeting different applications. To start off: Datalog is distinguished from traditional SQL in its focus on heavily-recursive reachability-based reasoning. With respect to expressivity, you can see Datalog as CDCL/DPLL restricted to boolean constraint propagation (i.e., Horn clauses). Operationally, you can think of this as: tight range-indexed loops which are performing insertion/deduplication into an (indexed) relation-backing data structure (a BTree/trie/etc...). In SQL, you don't know the query a-priori, so you can't just index everything--but in Datalog, you know all of the rules up-front and can generate indices for everything. This ubiquitous indexing enables the state-of-the-art work we see with Datalog in static analysis (DOOP, cclyzer), security (ddisasm), etc... Our group targets tasks like code analysis and these big graph problems because we think they represent the most computationally-complex, hard problems that we are capable of doing. The next step here is to scale our prototypes (a handful of rules) to large, realistic systems--some potential applications of that are, e.g., raw feature extraction for binaries when you do ML over binary corpuses (which otherwise require, e.g., running IDA) on the GPU (rather than IDA on the CPU), medical reasoning (accelerating MediKanren), and (hopefully) probabilistic programming (these neuro-symbolic applications). By contrast, I think work which takes a more traditional Databases approach (CodeQL, RDFox, ...) focus a little less on ubiquitous high-performance range-indexed insertion in a tight loop, and focus a little more on supporting robust querying and especially operating on streams. There is some very cool related work there in differential dataflow (upon which differential Datalog is built). There is a solver there named DDlog (written in Rust) which takes that approach. Our in-house experiments show that DDlog is often a constant factor slower than Souffle on GPUs, and we did not directly compare against DDlog in this paper--I expect the results would be roughly similar to Souffle. |
This was historically true but SQL has CTE's and recursive CTE's these days, and even some extra syntactic sugar for reachability query's. And of course (given the former) "inference" and "deduction" over data are just a CREATE VIEW statement away. This is the "you know all of the rules up-front" part; a VIEW is just a query that you do know upfront. Most uses of "deductive" databases are just for querying within some in-memory database, which is not really playing in the same league as a fully general RDBMS. This is not intended to dismiss the work in OP, of course; if anything, its applicability need not be restricted to a tiny niche of software intended for specific application areas, and can be quite a bit broader than that.