Hacker News new | ask | show | jobs
by cyocum 1405 days ago
The author of this post mentions the Humanities at the end of their post and TerminusDB. I work on a Humanities based project which uses the Semantic Web (https://github.com/cyocum/irish-gen) and I have looked at TerminusDB a couple of times.

The main factor in my choice of technologies for my project was the ability to reason data from other data. OWL was the defining solution for my project. This is mainly because I am only one person so I needed the computer to extrapolate data that was logically implied but I would be forced to encode by hand otherwise. OWL actually allowed my project to be tractable for a single person (or a couple of people) to work on.

The author brings up several points that I have also run into myself. The Open World Assumption makes things difficult to reason about and makes understanding what is meant by a URL hard. Another problem that I have run into is that debugging OWL is a nightmare. I have no way to hold the reasoner to account so I have no way when I run a SPARQL query to be able to know if what is presented is sane. I cannot ask the reasoner "how did you come up with this inference?" and have it tell me. That means if I run a query, I must go back to the MS sources to double check that something has not gone wrong and fix the database if it has.

Another problem that the author discusses and what I call "Academic Abandonware". There are things out there but only the academic who worked on it knows how to make it work. The documentation is usually non-extant and trying to figure things out can take a lot of precious time.

I will probably have another look at TerminusDB in due course but it will need to have a reasoner as powerful as the OWL ones and an ease of use factor to entice me to shift my entire project at this point.

2 comments

> I work on a Humanities based project which uses the Semantic Web (https://github.com/cyocum/irish-gen) and I have looked at TerminusDB a couple of times.

I had never come across anything like this before, but this is a wonderful project.

"Reasoning" capability can be added to any conventional database via the use of views, and sometimes custom indexes. The real problem is that it's computationally expensive for non-trivial cases.
As you put the word Reasoning in quotation marks, I might misunderstand your bottom line here (I am Autistic, so please do not get quirky on natural language semantics), but the bare statement: "Reasoning can be added to any conventional database" is just not right. Reasoning is a well-defined notion from logic, that is based on formal languages, semantics and a relation called entailment (inference in proof theory) respectively. None of that does natively exist in a database. In the literature, there are two well-known ways for integrating a notion of reasoning into a database. Firstly, Datalogic was invented to create recursive queries. Datalogic's relation to reasoning was a side-effect, and it only covers a fragment based on horn clauses. On the other hand there's OWL-DL a (limited) fragment of OWL, that encodes some kind of reasoning via query expansion on vanilla SQL-Queries. So maybe you can elaborate on the notion of "using views, and sometimes custom indices to add reasoning to a conventional database".
You can think of views as modelling a particular sort of implication, which is nevertheless somewhat restricted. Despite the restriction, it may be sufficient to cover many usages of OWL, but you may need to squint a bit -- what I mean to say there is it is not exactly an implementation of implication, but it may be used to model it and so some degree of reinterpretation of the resulting set of tables and views may be needed. The type of implication supported is roughly "a result to given SQL query based on (a combination of) existing tables and views => new record in a fresh table/relation".
I hardly see how you can define in a RDBMS that a resource that both have an engine and four wheels should be seen as a car. Without going into a nightmare of unbearable SQL...
The SQL for describing "resources that contain other resources" gets a bit unidiomatic, but defining a query for those that have e.g. an engine and four wheels is quite easy. Then you can add that as a custom view, so that your inferred data is in turn available and queryable on an equal basis with raw input to the knowledge base.
Sure. But maintaining the coherence between your business data model definitions and their implementation in the RDBMS can quickly become a massive headache, don't you think?