| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zcw100 1459 days ago

That's hilarious that they're so out of touch that the JSON-LD and SHACL people are the realists. JSON-LD is ridiculous and a desperate attempt to attach semantic web technology to something that is actually used. It was announced with the misleading post titled "JSON-LD and Why I hate the Semantic Web". Only it's completely about the semantic web and is just a serialization format for RDF expressed in JSON. So you take JSON a format that can be described in a single page and layer this monstrosity over it. For what? The best thing that can be said for it is you can ignore it (maybe) and just treat it as JSON. There was zero need for JSON-LD. They had a perfectly good serialization format in TURTLE. It was similarly easy to describe and understand as JSON, but nooooo. They're always riding the coat tails of some other popular technology trying to get a free ride. It's like that obnoxious kid who shows up to the party and tells you how much smarter they are than you and then complains that no one will talk to them.

SHACL is almost dumber that JSON-LD, if you can believe that. At least JSON-LD is just a serialization format. If you can manage to get it to parse you can just reserialize it to a sane format like TURTLE and get rid of the stupid. With SHACL you're stuck with it. So you go and create the worlds slowest database by basically normalizing the hell out of it because if a little normalization is good a lot of normalization is better. Screw knowing anything about the world. Let's throw that all out the window and allow people to express anything.

So now you've got a database that can express anything and people say, "hey, I actually know some things about the world that seem to hold and my database is becoming a ridiculous mess. Can we make it so that people can't express something like I already can with my relational database?" Well the semantic web people went off in deep thought for a decade and finally came back with SHACL. It's a constraints language, expressed in RDF, of course, and if you express it in JSON-LD that means you've got SHACL expressed in RDF expressed in JSON, joy. I guess you could implement is in a number of different ways but it ends up firing off a series of queries saying, "Is this ok? What about that? How about this?" and if they're all successful then it will allow you to run the query you actually wanted to run. So now after all that you've got the world's slowest database that is now orders of magnitude even slower so that it operates a bit like MySQL.

Semantic web databases allow you to express just about any query you'd like but it allows you to express queries that you're never going to, and is extremely slow for the ones you are.

3 comments

j-pb 1459 days ago

I'm not gonna add much, because I think we're on a similar page in terms of RDF-rantyness. I find the entire semantic web/linked data space horribly bloated and overcomplicated.

However! There is something to be said about triples and normalisation. Is the general idea of triples a really good format for databases? Maybe not. Is it a really good format for knowledge representation? Yeah I think so.

Real world knowledge is quite messy and riddled with exceptions. People from cultures without a last name. A stump is still a tree in the right context, even if it doesn't have a crown. A character in D&D might have traits that are completely uncommon for their class. The greek gods for sleep, death or the night sky are both concepts and characters.

You can't model these things with a database, which is primarily tailored towards modelling the "inner world" of a computer.

The semantic web is still frustratingly bad at these things, because of description logics, OWL and their mind-share that _everything must be class based_ and enforced/checked at creation/load time.

In reality it's much better to throw all of that away, and just do duck typing at query time, by letting the consumer decide which entities they want to process. Sure some entities might not get processed, because they don't conform to the shape that the consumer expects, but that's a strength. A different system might consume them, they might be ignored indefinitely, or they are handled at a later time.

The directedness of individual facts also allows you to implicitly encode "who makes this assertion", providing a mechanism to make distributed consistency much easier.

Limiting the number of columns (to 3) also allows you to materialise all possible indices (for each ordering), which is really interesting in combination with worst case optimal join algorithms.

However nothing in the RDF ecosystem makes use of these strengths. It's all rigid, classy, complex, slow and buggy, but I don't think that a heavily normalised knowledge base build on triples has to be that way.