Hacker News new | ask | show | jobs
by j-pb 1459 days ago
I'm not gonna add much, because I think we're on a similar page in terms of RDF-rantyness. I find the entire semantic web/linked data space horribly bloated and overcomplicated.

However! There is something to be said about triples and normalisation. Is the general idea of triples a really good format for databases? Maybe not. Is it a really good format for knowledge representation? Yeah I think so.

Real world knowledge is quite messy and riddled with exceptions. People from cultures without a last name. A stump is still a tree in the right context, even if it doesn't have a crown. A character in D&D might have traits that are completely uncommon for their class. The greek gods for sleep, death or the night sky are both concepts and characters.

You can't model these things with a database, which is primarily tailored towards modelling the "inner world" of a computer.

The semantic web is still frustratingly bad at these things, because of description logics, OWL and their mind-share that _everything must be class based_ and enforced/checked at creation/load time.

In reality it's much better to throw all of that away, and just do duck typing at query time, by letting the consumer decide which entities they want to process. Sure some entities might not get processed, because they don't conform to the shape that the consumer expects, but that's a strength. A different system might consume them, they might be ignored indefinitely, or they are handled at a later time.

The directedness of individual facts also allows you to implicitly encode "who makes this assertion", providing a mechanism to make distributed consistency much easier.

Limiting the number of columns (to 3) also allows you to materialise all possible indices (for each ordering), which is really interesting in combination with worst case optimal join algorithms.

However nothing in the RDF ecosystem makes use of these strengths. It's all rigid, classy, complex, slow and buggy, but I don't think that a heavily normalised knowledge base build on triples has to be that way.

2 comments

Huh? OWL doesn't check anything when you load data. What OWL does is infer new facts based on the facts that are there.

For instance if you put in an axiom that says "a manager must manage one or more employees" then the system will infer that X is a manager once you add a fact that Y is managed by X. Classes in OWL are classes as in "classification", not classes like Java where you have to create a class simply to have a place in memory to put a facts.

Some of the reason why people "just don't get RDF" is that it works exactly in the opposite of conventional systems and that creates so much cognitive dissonance that you can see people's brains shorting out when they encounter it.

Sorry for my imprecision. Yes, OWL can be used like that, and automatic tagging is one of the few good use cases.

But in reality A-Box completion is not the big use case for OWL. T-Box model checking is.

All those fancy bioinformatics ontologies and "databases" that get paraded around by the Dl folks. All those lower ontologies. There is not a single A-Box fact describing genes, diseases, products, objects or whatever. It's all T-Box concepts.

I mean, there's papers lamenting common RDF database T-Box size and performance limitations, because they want to collect medical data, but have to shoehorn it into the ontology.

That's also something that the authors don't seem to get. Shacl popped up because people wanted to have something that operstes over their A-Boxes without slowly dragging their entire modeling and data storage into the T-Box. That's why they don't want the "description logic perspective", as it automatically leads down that "no instances, just theories" rabbit hole.

As an aside. Even if OWL was used for classification only. It'd be rather moot. So you've classified something as a manager. Once you act in it, e.g. by having a query that only asks for manager entities, you are stuck with the same brittle class based approach, where the query requires more constraints than it actually needs. The query already contains all the properties required, it's its own anonymous classification so to speak.

Practically I work with SPIN. I wish somebody would make a production rules engine that was easy to live with. I want to like Drools but I can't read the error messages for complex programs I write. With Jena Rules the system is simple enough that I can figure problems out looking at the source code but it doesn't have as many features.

Unfortunately logic is a depressing subject because it starts with a bunch of theorems about what is impossible (Gödel, Tarski,Turing.) There is no system of negation that is without problems (OWL takes the radical choice of no negation) Commonsense reasoning involves a lot of "Alice thinks that Jane thinks that..." and "A was true until 12:30 this afternoon, now A is false".

The theory vs interpretation split is another one of those decisions you have to make if you want to do logic: I am on a committee where I'm the guy who speaks for interpretations and the A-Box but some of the other people are serious T-Boxers.

It amazes me that this system

http://inform7.com/

creates an illusion of letting an English major write a script for an adventure game that reads like English that someone can play in what looks like a subset of English. It does it all with a very primitive production rules engine that relies heavily on defaults. Practical logic requires attention to rules and "schemes" (X macros, configuration settings on the rules engine.) I wrote an adventure game with a few rooms and objects in Drools and dreamt of making something like "Inform 7 for business rules".

> Commonsense reasoning involves a lot of "Alice thinks that Jane thinks that..." and "A was true until 12:30 this afternoon, now A is false".

These are both examples of modalities. From a formal point of view, description logics are special cases of multi-modal logics. The semantics of these can in turn be understood as computationally well-behaved restrictions of FOL, where the logical quantifiers are understood to range over so-called "possible worlds".

The issue with the DL approach to FOL fragments is that the operator-subset approach is too coarse grained. We need something more finegrained that allowes for syntactic and semantic constraints over the specific interpretations and theories. An example of such an approach is stratified negation in datalog. DL would simply ban negation or recursion, but limiting the syntax of datalog programs allows for correct semantics while still allowing solutions for many interesting problems.

One of our running gags is: "Snomed might not be able to give you a diagnosis on why you are sick, or provide you with a treatment plan, but thanks to the power and efficiency of description logics it can tell you that you leg bone is connected to your hip bone and that both are bones 80 billion times per second."

It is fun with snomed to track a blood vessel in the periphery all the way to the heart and back.
> However! There is something to be said about triples and normalisation. Is the general idea of triples a really good format for databases? Maybe not. Is it a really good format for knowledge representation? Yeah I think so.

There's Datomic and XTDB as practical examples of databases built on data models that equal/similar to triples.

Right, not semantic web technologies and not very often used. Before someone jumps in with, "but it's used at....", I'm not saying people aren't using it but in the grand scheme they are extremely niche products.