Hacker News new | ask | show | jobs
by zcw100 2376 days ago
I could write a book on what's wrong with the semantic web. One of the worst isn't even technical, it's the community. There are some great people in the community but there are also a large number of extremely toxic people that drive people away. If the technology ever takes off it's going to be because some outside community cherry-picks the good parts and tells those people to f-off. That's already starting to happen and you'll hear no end of bitching from people in the semantic web community about how they're reinventing what they've already done years ago. Guess what? You're right. You're so toxic that it's worth redoing everything if it means they don't have to deal with the toxic attitudes.
5 comments

> a large number of extremely toxic people that drive people away

It's funny you say that because as soon as I saw semantic web I had a negative emotional nostalgia. I can hardly remember all of the RDF/RSS/Atom stuff from way, way back or what the trigger for that is but I just remember there being rancor swirling around the whole thing. I think there was some petty arguments about who deserved credit for the creation of the formats or something? Wasn't it between a bunch of bloggers? Then XHTML became a battleground since some groups were trying to keep semantic tags out of it while other people wanted them in. I remember just feeling exhausted every time the subject came up since it was like emacs vs. vim or space vs. tabs wars.

The funny thing is, I believe in the promise of the semantic web. I recall Tim Berners-Lee declaring the next frontier was not open source but open data and I agree. He even co-founded an institute around it: https://theodi.org/person/sir-tim-berners-lee/

> I can hardly remember all of the RDF/RSS/Atom stuff from way …

You're mixing in some stuff, that aren't really Semantic Web related.

RSS vs. Atom was less about the Semantic Web than an squibble between different XML formats, one very loosely specified, the other more ... well-formed. The Semantic Web did had a small foot in the RSS wars - the very first RSS (RSS 0.9 from Netscape) was RDF based and for a short time RSS 1.0 wanted to rebuild RSS on an RDF basis for the expandability of the Semantic Web, but the later discussion were about the XML variants of RSS and then Atom, wether the spec was adequate, wether it was frozen or how and wether it should be fixed, etc.

The XHTML discussions were less about elements in my recollection but about parsing models. XHTML reformulated HTML als XML which meant an error model with no error correction but failure on the first error. And XHTML 2 tried to evolve structural elements by being not backward compatible but defining a somewhat different new dialect. The backslash against XHTML was against that, a group sponsored by the browser makers then formed which wanted to evolve backwards-compatible and to standardize the parsing of tag soup → HTML5.

(„Semantic elements“ were often a shorthand for „instead of a dumb div use the appropriate HTML element. That was more the quest of the web standards project than the Semantic Web.)

(Slight overlap: How to embed Semantic Web statements has a small relationship with XHTML - RDFa started imho in an XHTML 2 module.)

I somewhat miss that time. All these bloggers with an interest in web standards and how to do them best had their own idealism and the cross blog and W3C discussions were always interesting. Today web standards don't have that publicity and idealism anymore, they seem more like an engineering collaboration of the 2½ big browser makers which get to decide among themselves. Maybe it was always so, but it seemed different at that time.

Our recollections of history of similar, however I also recall there being discussion about preventing semantic tags from being included in XHTML. A certain segment of the population believed it didn't belong in the document but rather as a corollary document in RDF or whatever (an argument of data normalization vs. denormalization).

Atom/RSS was involved in the debate because they were also trying to solve the metadata issue. Things like "author", "publish date", etc. are just as relevant to aggregation/syndication formats as it is to the document itself. Again, I'm summoning my fallible memory here, but one argument was if the metadata is relevant to both documents then it ought to be stored separately and linked to the HTML/RSS docs using a URL.

XHTML was involved because as an XML format it was conceivable to store your metadata separately AND to use XSLT to transform it into your XHTML/RSS/Atom document on demand. So RDF, Atom, RSS and XHTML authors all wanted a say on a metadata format that would suit all of those use cases. That is a tall order.

My personal feeling about the death of XHTML was it wasn't one big thing that killed it. It was hundreds of smaller disputes like this one.

It's interesting how perceptions vary. I've been working with SemWeb stuff for a decade or so, and I have never experienced what you describe here:

One of the worst isn't even technical, it's the community. There are some great people in the community but there are also a large number of extremely toxic people that drive people away.

Maybe it's just the subset of the community that I choose to deal with, but the folks on the Jena mailing lists (pre and post Apache) have always been very gracious and helpful in my experience. And Ralph Hodgson, one of the co-founders of Top Quadrant came to a Triangle Java User's Group talk that I once gave on Semantic Web technologies, along with a bunch of other Top Quadrant people... and despite the fact that my company competes with them in certain areas, they were perfectly cordial and pleasant to interact with. Likewise for the other times that I've had Top Quadrant folks show up at events where I was speaking.

Maybe it's just dumb luck on my part, or whatever, but I have found no major issues with toxic people in the SemWeb community. shrug

Yes, there are some wonderful people in there. Andy Seaborne has always impressed me with his thoughtful responses. I won't call out any of the bad apples. Usually a question goes something like, "Uh, I'm new to the semantic web and I'd like to do X" and the response is, "this is how it works, you're a dummy and you need to understand how brilliant the semantic web is and you don't need to be doing what you're asking for" or academics who will complain that they're not getting enough credit for providing their brilliant intellectual scaffolding.

Databases, that are run on a shoe string, aren't stable so we're going to make everything federated with linked fragments? Fine, give it a go but you don't need to go on and on about how databases are inadequate because someone isn't willing to foot the AWS bill so they can host dbpedia for ya.

Lets have a go at JSON-LD. RDF/XML is finally recognized as a mistake. A somewhat reasonable mistake because everyone was XML crazy at the time. So what do we do? The exact same thing except this time it's JSON. But it's even worse. We choose a serialization that is prized for its simplicity and we foist the entire RDF stack onto it? Then they claim that JSON-LD isn't about the semantic web so we're good and Jedi mind trick it with, "This isn't the RDF you're looking for".

Because we aren't done overcomplicating simple things we take aim at CSV with CSVW. Granted CSV has some subtle complexities but it's easy and reasonably compact. So now we're going to add metadata to csv files with rdf and then serialize it into JSON as JSON-LD. Great. How do I find this metadata. Either a well known location or in a link header. Whoops I can't publish metadata and reference your csv file. Lets convert your csv fie to rdf. WTF. my 500Mb csv file just became 1.5B triples and it's taking 8hrs. to load it into my triple store!

Don't get me started on people who call themselves ontologists. They're really zombies but instead of eating brains they eat budgets. They should be dispatched the same way, with a shotgun blast to the face. They generally can't justify their decision even though there is a framework to do that, onto clean. I have yet to meet one who even knew what that was. They just convince management that what they're doing is intellectually unattainable by mere developers although they'd be lost without protege, top braid, or excel and what they produce is generally an incomputable pile of garbage. It's always OWL full. "Class or property? Class or Property? Well is is an "is a" relationship."

I'm done writing so I'll just include a list of the half baked ideas that sound good but are a day late and a dollar short. LDP, R2RML, ShEX, SHACL, DCAT, RDF Data Cube, WebID.....

My wife always says to say something nice so I'm going to say SKOS. SKOS is ok.

RDF/XML is finally recognized as a mistake.

Finally? From what I've seen, most everybody in the SemWeb community moved on to N3 or Turtle over a decade ago, with a little bit of interest in the aforementioned JSON-LD.

I'm a fan of SKOS myself.

I may remain more of a fan of SemWeb tech because I say away from the edges. Out of LDP, R2RML, ShEX, SHACL, DCAT, RDF Data Cube, WebID....., I use none of those. Add GRDDL to the list of things I don't need as well.

Community aside, I'd love to read an article on what's good and bad in the current semantic web. Maybe it would have to be written anonymously.

Or maybe contact O'Reilly and write an intro book "Semantic Web: Just the Good Parts" for their series.

This is precisely why I left the rdf* mailing list shortly after it was created this year. The list maintainer invited subscribers to introduce themselves and as soon as a particular individual posted, I no longer wanted to be a member of the list.
I'm curious what those toxic attitudes are. Surely the "we already invented it and you're reinventing it" can't be the only case. I'm also curious if it's in an academia or in industry.
I shared the experience described by the grand parent. In particular I remember I had some argument on HN with a few people and the sheer amount of bad faith and technical inaccuracy thrown at me was jaw dropping. At this point I consider SW more a cult than a technology.

On the research side there are two kinds of research papers: the one that proposes an ontology for a domain, and the one that describes the conversion of an existing resource to RDF. I've never seen a paper where SW was used for something new and interesting and that would have been impossible without SW.

That being said, they are also both technical and conceptual pain points that are plaguing RDF. Basically the tech is trying to address too many things: both metadata and data, and every kind of data. "IRIs that can be URLs than can be sometimes dereferenced and sometimes not, but it's better if they are and then it's Linked Data" kind of thing makes it hard to assume (and thus build) anything.

So, RDF have been success in a few domains (biology) but in most case it doesn't offer a real competitive advantage over simpler and more expressive technologies such as graph databases.

PS: @zcw100 if you where to really write a book about semantic web, drop me a line please.

My take on the attitude in academia: Here we describe a set of algorithms that can solve a class of problems that previous algorithms can't. In the 60' someone published a solution to a problem we have improved upon with the novel innovation of called "hyperlinks". The technical, social and economical shortcomings of our solution are invalid because it is decentralised and therefor morally superior to the current offerings, used the world over, of industry practitioners who are only doing it for the money. More funding is needed for further research.
In general the decentralised fetishism isn't something that is big in academia (as in the academia that publishes paper). There's lots of issues in academia and even more with the semantic web, but fetishism of decentralisation isn't it.