| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by FjordWarden 507 days ago

If you expect Jena to be more battle-tested because it is older, forget it, if the process is killed by a unexpected shutdown or some other reason it results in data corruption. At least this was my experience a few years ago.

I found graph databases a beguiling idea when I first learned about them, and this is a welcome addition, but I've since temperated my excitement. They are not as flexible and universal a modal as is often promised. Everything is a graph, sure but the result of your SPARQL query not necessarily.

I found classical DBMS based on sets/multisets to be much easier to compose from a querying point of view. A table is a set/multiset and a result of a query is also a set/multiset, SPARQL guarantees no such composability. Maybe, if you want to start mucking around with inference engines, but you'll either run into problems of undecidability.

4 comments

PaulHoule 507 days ago

Jena lets you make little in-memory triple stores that you can use the way people use the list-map-scalar trinity. I've been working on this publication about that (RDF for difficult cases and when ordering counts) for years and it just got published last week

https://www.iso.org/standard/76310.html

I'll call out my collabortor Liju Fan for being the only person I've met who knew how to do anything interesting with OWL. (Well, I can do interesting things now but I owe it all to her.)

(For the research for that paper I used rdflib under PyPi because CPython was not fast enough.)

When I needed big persistent triple stores (that you use the way you might use postgres) I used to use

https://en.wikipedia.org/wiki/Virtuoso_Universal_Server

and had pretty good luck if I loaded a billion triples if I used plenty of 'stabilizers' (create a new AWS instance with ample RAM, use scripts to load a billion triples starting from an empty database, shut it down, make an AMI, start a new instance with the AMI, expect it to warm up for 20 minutes or so before query performance is good)

I don't regularly build systems on SPARQL today because of problems with updating. In particular, SQL has an idea of a "record" which is a row in a table, document oriented databases have an idea of a "record" which is a bit more flexible. Updating a SPARQL database is a little bit dangerous because there is no intrinsic idea of what a record is; i mean, you can define one by starting at a particular URI and traversing to the right across blank nodes and saying it is a 'record' and it works OK. But it's a discipline that I impose on it with my libraries, it ought to be baked into standards, baked into the databases, wrapped up in transactions, etc. For anything OLTP-ish I am still using SQL or document-oriented databases, but I hate the lack of namespaces and similar affordances that make SPARQL scalable in terms of "smash together a bunch of data from different sources" in document-oriented databases wheras SPARQL is missing the affordances you have in document-oriented databases for handling ordered collections. We badly need a SPARQL 2 which makes the kind of work that I talk about in that technical report easy.