Hacker News new | ask | show | jobs
by exogen 3537 days ago
> Has this been done before?

Depends what you mean by "this"! RDF [1] and most of the technology surrounding it and the "Semantic Web" are based on (subject, predicate, object) triples almost exactly like this, where each element is often a URI, and objects are often strings just like they are here.

It even has taken this idea to the next level where the statements expressed by such a triple can themselves be given an "anonymous" ID, which can then be used as a subject or object – meaning you can make meta statements about the statement itself, all while still using this simple system of triples.

There are even entire languages built around querying graphs of such triples: https://www.w3.org/TR/sparql11-query/

DBpedia [1] is one such project that attempts to encode data from Wikipedia in triples like this; their About page says that the 2014 version of the database had 3 billion triples, so that number is probably much higher now. Here's a preview if you want to see what these triples look like:

• Homepages of things: http://downloads.dbpedia.org/preview.php?file=2015-10_sl_cor...

• Genders of things: http://downloads.dbpedia.org/preview.php?file=2015-10_sl_cor...

etc. You'll notice that RDF predicates are all namespaced by URIs; that way you can unambiguously know in what sense "homepage" and "gender" are used (consider more ambiguous properties like "length"). That means there can be other uses of "homepage", "gender", "length" etc. that mean different things, and those will be namespaced by a different URI.

Anyway, this Outpan project is obviously a more loose and freeform version of that – but only slightly; RDF is not very strict at all, it's just that people have thought a lot about how to successfully model the entire world's information, and so real-world RDF ontologies end up looking somewhat complicated. I'm not sure if a freeform version like this has been widely attempted before.

[1] https://en.wikipedia.org/wiki/Resource_Description_Framework [2] http://wiki.dbpedia.org/

2 comments

To borrow a subject matter that's currently popular on the Outpan homepage, here are the first 500 facts DBpedia knows about Donald Trump:

    SELECT DISTINCT ?property ?value WHERE {
        <http://dbpedia.org/resource/Donald_Trump> ?property ?value
    } LIMIT 500
Results: http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbp... (although, note, not every dataset they have is loaded into their SPARQL endpoint)

As you can see there's a lot of metadata type properties, but scroll down and you can see his birthdate, children, alma mater, etc.

This page is just a prettified version of that data: http://dbpedia.org/page/Donald_Trump

So, amongst other things, he is an intellectual and an environmentalist? DB uselessness confirmed!
This is great! thanks. I will look into adding the dbpedia.org data.
Check out https://www.wikidata.org/ for another similar project with additional data! Their keys tend to be more opaque [1], but otherwise it's a very similar approach.

[1] e.g. the key for "Earth" is Q2: https://www.wikidata.org/wiki/Q2

Would you be open to having a little chat via email? hi@outpan.com
This is exactly what I thought when I saw this project. Having worked with DBPedia data and the triples format<Subject,Predicate,Object>; I was just wondering how Outpan just stole the idea of Triples and packaged it as a new idea
And then, there's also ConceptNet (http://conceptnet5.media.mit.edu/)
This seems interesting. Code for building ConceptNet5 [0].

[0] https://github.com/commonsense/conceptnet5

Why "stole" and not "reinvented"?