Hacker News new | ask | show | jobs
by emw 3885 days ago
The Wikidata taxonomy is basically the successor to Wikipedia's category tree. It not only irons out language-based differences (e.g. the category tree being different among Chinese, Spanish, English, etc. Wikipedias), but also captures the idea of generalization through a more semantically meaningful relation. This Wikidata concept tree is constructed with "subclass of" (P279) [1], a property that expresses the proposition "all instances of these items are also instances of those items". The goal is to have a subsumption hierarchy that classifies all human knowledge.

There's an RDF/OWL export of the Wikidata taxonomy available at [2] as wikidata-taxonomy.nt.gz, which can be explored with Semantic Web browsers like Protege [3].

Another fundamental relation -- "part of" (P361) [4] -- expresses mereological relationships. For (oversimplified) example: "iris part of eye", "eye part of head", "head part of body", etc. Both "subclass of" and "part of" are transitive.

A separate comment of mine in this discussion [5] describes how to traverse the "subclass of" tree in the Wikidata UI and a third-party tool called Wikidata Generic Tree. The same principle applies to the "part of" tree. The latter gets less attention, but is also quite interesting.

---

1. https://www.wikidata.org/wiki/Property:P279

2. http://tools.wmflabs.org/wikidata-exports/rdf/index.php?cont...

3. http://protege.stanford.edu/

4. https://www.wikidata.org/wiki/Property:P361

5. https://news.ycombinator.com/item?id=10448573

1 comments

Superb response. Deeply informative.

I am very excited about the potential knowledge engineering possibilities opened up by this large structured datasets.

I believe that at the very least we're going to have within a generation a machine-generated ontology to rival Kant and Aristotle. Then we'll have to figure out if this tells us more about how we've digitally organized the knowledge we have or whether it does in fact reveal something about reality and being.

Besides 'subclass of' and 'part of' are there any other taxonomic ways for concepts to relate to other concepts? There are parallels here of course with object-oriented-programming. It's funny, I only within the last year or so started reading up on mereology[0] but as soon as one starts thinking about concepts and there relationships one ends up there eventually. 'part of' is like encapsulation. 'subclass of' is like inheritance. Is there more?

[0] (from the Greek μερος, ‘part’) http://plato.stanford.edu/entries/mereology/

Yes, there's also 'instance of' (P31) [1]. Together, 'instance of', 'subclass of' and 'part of' comprise Wikidata's basic membership properties [2].

'Instance of' and 'subclass of' provide Wikidata with a way to express the basic philosophical notion of type-token distinction [3]. For things that are a subclass of something like 'material entity', all instances are physical objects that have a unique location in space and time.

Not all instances are spatiotemporal particulars, though. For example, one might say "Homo sapiens instance of taxon", where taxon is a metaclass, i.e. a class in which the instances are classes. (Here 'taxon' would not be a subclass of 'material entity' -- i.e. taxa are information artifacts, not physical objects.) Support for this kind of "punning" via metamodeling is a major feature of OWL 2 DL [4].

If this sort of thing interests you, definitely take a look into Wikidata [5]. The project will be a sea change for several key features in Wikipedia (e.g. infoboxes), and will likely be a main hub of the Semantic Web.

---

1. https://www.wikidata.org/wiki/Property:P31

2. https://www.wikidata.org/wiki/Help:Basic_membership_properti...

3. http://plato.stanford.edu/entries/types-tokens/

4. http://www.w3.org/TR/owl2-primer/

5. https://www.wikidata.org

Fantastic, I've read through your entire comment history :)

I'm familiar with OWL and RDF. I've been using Sparql and DBPedia, I'll switch to Sparql and Wikidata if you think that's the way to go. How do you see the overlap between DBPedia and Wikidata?

I'm concerned that there's going to be knowledge-grab by corporations and (perhaps) government entities. I fear that the knowledge graphs inside the big G and FB and Yandex and Apple and MS and so on to power their search engines and personal assistants will be orders of magnitude more sophisticated and complex and comprehensive that what will be available to open access research. Witness Freebase. Are my fears misplaced would you say?

I've read that SEP article, I've also read a good bit of Peirce's original journal article. As it says in SEP "It should be mentioned that for Peirce there is actually a trichotomy among types, tokens and tones,[...]" - I think it's amusing that basically everybody ignores the triadic distinction that Peirce claimed to be the case for a dualistic type/term distinction.

I'm looking forward to going through your tutorial quill in hand and pot of ink at the ready.

I think SPARQL and Wikidata are the way to go.

Regarding Wikidata and DBpedia: to my understanding the latter gets much of its content by scraping Wikipedia infoboxes. Wikidata will increasingly provide data for those infoboxes, and thus DBpedia.

Regarding your fears: I don't share them. Wikidata will greatly enhance the accessibility of knowledge for open access research.