Hacker News new | ask | show | jobs
by westurner 934 days ago
"Introduction to Datalog" re: Linked Data https://news.ycombinator.com/context?id=34808887

pyDatalog/examples/SQLAlchemy.py: https://github.com/baojie/pydatalog/blob/master/pyDatalog/ex...

GH topics > datalog: https://github.com/topics/datalog

datalog?l=rust: https://github.com/topics/datalog?l=rust ... Cozo, Crepe

Crepe: https://github.com/ekzhang/crepe :

> Crepe is a library that allows you to write declarative logic programs in Rust, with a Datalog-like syntax. It provides a procedural macro that generates efficient, safe code and interoperates seamlessly with Rust programs.

Looks like there's not yet a Python grammar for the treeedb tree-sitter: https://github.com/langston-barrett/treeedb :

> Generate Soufflé Datalog types, relations, and facts that represent ASTs from a variety of programming languages.

Looks like roxi supports n3, which adds `=>` "implies" to the Turtle lightweight RDF representation: https://github.com/pbonte/roxi

FWIW rdflib/owl-rl: https://owl-rl.readthedocs.io/en/latest/owlrl.html :

> simple forward chaining rules are used to extend (recursively) the incoming graph with all triples that the rule sets permit (ie, the “deductive closure” of the graph is computed).

ForwardChainingStore and BackwardChainingStore implementations w/ rdflib in Python: https://github.com/RDFLib/FuXi/issues/15

Fast CUDA hashmaps

Gdlog is built on CuCollections.

GPU HashMap libs to benchmark: Warpcore, CuCollections,

https://github.com/NVIDIA/cuCollections

https://github.com/NVIDIA/cccl

https://github.com/sleeepyjack/warpcore

/? Rocm HashMap

DeMoriarty/DOKsparse: https://github.com/DeMoriarty/DOKSparse

/? SIMD hashmap

Google's SwissTable: https://github.com/topics/swisstable

rust-lang/hashbrown: https://github.com/rust-lang/hashbrown

CuPy has array but not yet hashmaps, or (GPU) SIMD FWICS?

NumPy does SIMD: https://numpy.org/doc/stable/reference/simd/

google/highway: https://github.com/google/highway

xtensor-stack/xsimd: https://github.com/xtensor-stack/xsimd

GH topics > HashMap: https://github.com/topics/hashmap

1 comments

Yeah, IncA (compiling to Souffle) and Ascent (I believe Crepe is not parallel, though also a good engine) are two other relevant cites here. Apropos linked data, our group has an MPI-based engine which is built around linked facts (subsequently enabling defunctionalization, ad-hoc polymorphism, etc..), which is very reminiscent of the discussion in the first link of yours: https://arxiv.org/abs/2211.11573
TIL about Approximate Reasoning.

"Approximate Reasoning for Large-Scale ABox in OWL DL Based on Neural-Symbolic Learning" (2023) > Parameter Settings of the CFR [2023 ChunfyReasoner] and NMT4RDFS [2018] in the Experiments. https://www.researchgate.net/figure/Parameter-Settings-of-th...

"Deep learning for noise-tolerant RDFS reasoning" (2018) > NMT4RDFS: http://www.semantic-web-journal.net/content/deep-learning-no... :

> This paper documents a novel approach that extends noise-tolerance in the SW to full RDFS reasoning. Our embedding technique— that is tailored for RDFS reasoning— consists of layering RDF graphs and encoding them in the form of 3D adjacency matrices where each layer layout forms a graph word. Each input graph and its entailments are then represented as sequences of graph words, and RDFS inference can be formulated as translation of these graph words sequences, achieved through neural machine translation. Our evaluation on LUBM1 synthetic dataset shows 97% validation accuracy and 87.76% on a subset of DBpedia while demonstrating a noise-tolerance unavailable with rule-based reasoners.

NMT4RDFS: https://github.com/Bassem-Makni/NMT4RDFS

...

A human-generated review article with an emphasis on standards; with citations to summarize:

"Why do we need SWRL and RIF in an OWL2 world?" [with SPARQL CONSTRUCT, SPIN, and now SHACL] https://answers.knowledgegraph.tech/t/why-do-we-need-swrl-an...

https://spinrdf.org/spin-shacl.html :

> From SPIN to SHACL In July 2017, the W3C has ratified the Shapes Constraint Language (SHACL) as an official W3C Recommendation. SHACL was strongly influenced by SPIN and can be regarded as its legitimate successor. This document explains how the two languages relate and shows how basically every SPIN feature has a direct equivalent in SHACL, while SHACL improves over the features explored by SPIN

/? Shacl datalog https://www.google.com/search?q=%22shacl%22+%22datalog%22

"Reconciling SHACL and Ontologies: Semantics and Validation via Rewriting" (2023) https://scholar.google.com/scholar?q=Reconciling+SHACL+and+O... :

> SHACL is used for expressing integrity constraints on complete data, while OWL allows inferring implicit facts from incomplete data; SHACL reasoners perform validation, while OWL reasoners do logical inference. Integrating these two tasks into one uniform approach is a relevant but challenging problem.

"Well-founded Semantics for Recursive SHACL" (2022) [and datalog] https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A169...

"SHACL Constraints with Inference Rules" (2019) https://arxiv.org/abs/1911.00598 https://scholar.google.com/scholar?cites=1685576975485159766...

Datalog > Evaluation: https://en.wikipedia.org/wiki/Datalog#Evaluation

...

VMware/ddlog: Differential datalog

> Bottom-up: DDlog starts from a set of input facts and computes all possible derived facts by following user-defined rules, in a bottom-up fashion. In contrast, top-down engines are optimized to answer individual user queries without computing all possible facts ahead of time. For example, given a Datalog program that computes pairs of connected vertices in a graph, a bottom-up engine maintains the set of all such pairs. A top-down engine, on the other hand, is triggered by a user query to determine whether a pair of vertices is connected and handles the query by searching for a derivation chain back to ground facts. The bottom-up approach is preferable in applications where all derived facts must be computed ahead of time and in applications where the cost of initial computation is amortized across a large number of queries.

From https://community.openlinksw.com/t/virtuoso-openlink-reasoni... https://github.com/openlink/virtuoso-opensource/issues/660 :

> The Virtuoso built-in (rule sets) and custom inferencing and reasoning is backward chaining, where the inferred results are materialised at query runtime. This results in fewer physical triples having to exist in the database, saving space and ultimately cost of ownership, i.e., less physical resources are required, compared to forward chaining where the inferred data is pre-generated as physical triples, requiring more physical resources for hosting the data.

FWIU it's called ShaclSail, and there's a NotifyingSail: org.eclipse.rdf4j.sail.shacl.ShaclSail: https://rdf4j.org/javadoc/3.2.0/org/eclipse/rdf4j/sail/shacl...