Hacker News new | ask | show | jobs
by igorlev 2829 days ago
What happened is that the technology spawned by the Semantic Web "fad" is now absolutely everywhere but it looks and works nothing like how people thought it would.

Freebase, after being bought by Google became the foundation of the Google Knowledge Graph (aka "things not links"). This kicked off an arms race between all the major search providers to build the largest and most complete knowledge graphs (or at least keep pace with Google [1]). Instead of waiting for folks to tag every single page, it turned out that simple patterns cross referenced across billions of pages were good enough to extract useful knowledge from unstructured text.

Some companies who had easier access to structured but dirty data (like LinkedIn and Facebook) were also able to utilize (and contribute to) all of that research by building their own knowledge graphs with names like the Social Graph and Economic Graph. Those in turn are helping to power a decent amount of their search and ad targeting capabilities as well as spawning some interesting work[2]

All those knowledge graphs became a major part of Siri, Alexa and Google Home's ability to answer a wide range of natural language queries. As well as being pretty fundamental to a lot of tech like semantic search, improved ecommerce search and a bunch of intent detection approaches for chatbots.

So yeah while the technology and associated research did turn out to be incredibly useful, adding fancier meta-tags to pages was not the direction that proved the most useful.

[1] https://ai.google/research/pubs/pub45634 [2] https://research.fb.com/publications/unicorn-a-system-for-se...

2 comments

The problem with all this is that Google, Facebook, Linkedin et al are private companies, so their knowledge graphs are, well, theirs.

The idea with the semantic web was that it would be open and it would belong to its users, not to some cabal of giant corporations that would use it to control the internets.

That notion of openness and co-authorship of the knowledge on the web is now as dead as the parrot in the Pythons skit. And we're all much the worse for it- see all the debates about privacy and ownership of personal information and, indeed, metadata.

IIRC, Common Crawl exposes the semantic data from the sites they crawl. One could build their own knowledge graph (or at least bootstrap one) from that and other available data sources (DBPedia, WikiData etc.)
That's not sufficient - the "private" knowledge graphs of e.g. Google aren't "crawlable", they aren't public and don't (solely) rely on the sites. DBPedia+Wikidata+all other open data sources are not sufficient for a good knowledge graph that can be competitive (in terms of coverage, thoroughness, and recency of updates) with what the megacorps can afford to maintain behind closed doors.
Yup!

I made an observation about monetizing the Semantic Web when playing the role of the data/ontology provider. You providea all the data while Siri, Alexa and Google Home gets the glory: https://news.ycombinator.com/item?id=18036041