Hacker News new | ask | show | jobs
by subhobroto 2852 days ago
Fantastic question. A major component of a machine generated ontology has to be a notability score, otherwise it would be practically impossible to store all entities (and their relationships).

Further, for this to scale, Diffbot has to have a way to align their entity IDs with IDs from other notable graphs like Wikipedia, Wikidata, Freebase, Wordnet or even Yelp, and the like, otherwise the data could be potential of diminished value.

How would I know that the "Cardi B" that's in my database with ID 321 and wikidata ID Q29033668 is the same as Diffbot's "Cardi B" with ID 561?