Hacker News new | ask | show | jobs
by the-alchemist 945 days ago
There's a lot of Protege tutorials out there, but here's what an ontology (in the OWL sense of the word, let's you do). The "pizza ontology" is the Hello World of the ontology world.

* imagine a graph of things you wanna organize ("model"), say world religions, or the plant and animal kingdom * you can tell the system that anything that's a plant can never be an animal. or viruses can't be bacteria * or lions and zebras are both mammals * you can define what mammals are, vertebra, heart, brain, etc.

The interesting part is ontology _validation_ or querying.

Is it internally consistent? Maybe you specified viruses and bacteria and said they are never the same thing, but the way you modeled it, they are identical! Hmm, you'll have to update your definition of bacteria, or viruses, or both.

Next, you try to put fungi in the system, but there's an error because fungi do not belong to the plant or animal kingdom: they are their own thing.

So this is a fairly simplistic use case, but scale this up to hundreds if not thousands of entities and you can start to see the value.

Imagine sticking the human genome in there, and which drugs act on which chromosomes, etc.

It's a niche, for sure, if you need something like reasoning it's the way to go.

2 comments

The main issue with ontologies, and the reason why they're not popular besides a few niche cases, is that they try to solve a fundamentally unsolvable problem: getting (a large amount of) humans to agree on a "correct" modelling of something non-trivial.

When you narrow down the domain to something where a consensus on representation can be reached, then sure, reasoning is a plausible use case... except for the fact that it scales very poorly, and making it work on a set of data large enough to be interesting requires a disproportionate amount of computing power.

Yes, consensus in ontology building has traditionally been a huge drag for the adoption of ontologies. While it's not necessarily required, having consensus about ontology can obviously increase their utility. At the same time I think it's important to have explicit dissent (differing world views) and give both a room to grow, rather than trying to create the "one true" view on the world.

However, I don't think the core issue is consensus itself, but instead that the prevalent form of consensus in the ontology authoring space is consensus by committee rather than consensus by usage (as is usual in the open source software space).

That's why I've in the past been involved in creating Plow[0], a package manager for ontologies, with the aim of bringing the same "grassroots" nature and network effects that you find in other open source ecosystem to ontology engineering.

[0]: https://plow.pm/

Do "stochastic ontologies" exist? You define probabilities for certain attributes and category assignments, then you do some max likelihood estimate over all unknowns, which yields the most likely, internally consistent world model.
Yes, you'll find something under the keywords Probabilistic Ontologies and Bayesian Ontology Reasoner.
Isn't an LLM essentially a stochastic ontology? Maybe that's why LLM's generalize so well to problems you wouldn't think would be amenable to next word prediction based on text analysis.
> At the same time I think it's important to have explicit dissent (differing world views) and give both a room to grow, rather than trying to create the "one true" view on the world.

you can embed this into ontology itself, e.g. create classes/entities: InPeterView, InMaryView etc.

Yes, I am very aware of that. However, realistically, Party1 with WorlView1 will be in charge of maintaining WorldView1 in their ontology document, and it is better to leave Party2 to maintain their WorldView2 in their own separate ontology document.

Of course sometimes there is a need to reconcile both world views, and there have been swaths of literature being written about ontology alignment. Optimally the parties would also share the things that they agree on and co-maintain them in separate ontology documents, though in practice this doesn't happen nowadays due to lack in ontology engineering tooling.

> co-maintain them in separate ontology documents, though in practice this doesn't happen nowadays due to lack in ontology engineering tooling.

there are multiple efforts to build some core standard ontologies (e.g. schema.org) which then can be used as common vocabulary.

And for good reason they don't gain widespread adoption. E.g. schema.org is barely used outside of making your website better scrapeable for Google - it is an (indirect) Google project after all.

The only "core" ontologies that have really found adoption over the decades are the ones that everyone is forced to use as they are baked into the standards (RDF/RDFS), and Dublin Core for metadata (where only 5 of the ~100 terms are commonly used).

Why does it have to be something non-trivial? Why do a lot of humans need to agree?

You can have an ontology that is used only by you. Maybe a 1000 people need to agree, and they would probably be on your payroll. It could be something trivial and already kind of decided, like movies metadata, etc. It's there just to power your internal systems, not for humanity to agree upon.

For popular use, it really comes down to the tooling. If I take this knowledge that I already have and write an ontology for it, what do I have to gain? Sadly, with the current state of tooling, you gain nothing.

Ontologies are behind some of the systems that help mapping between different models used by different groups for things in the same space (for example, mapping between different ways of interpreting and communicating medical data through HL7 messages).

An Ontology doesn't mean it has to decide on single correct model - in fact, I'd say such ontology is particularly poor and a technology that limits to that is too limited to be used in ontology field.

But wikidata is trying
Those who say it can't be done should not interrupt the people doing it, I suppose!
You're right that it comes down to just domain modeling, but institutions that don't require some kind of democratic consensus (say, inventory systems for individual companies) do not always need that unless you plan on exposing the data to others. This is the distinction between "linked data" and "semantic web".
It's generally more interesting to apply validation to data than ontologies themselves. OWL makes this harder, because it rejects two assumptions that are commonly used in real-world modeling: (1) Unique Name Assumption; every object in the domain is described by a single entry in your data model. By contrast, OWL will always try to conflate different entries in order to solve logical consistency issues that arise from your model; (2) Closed-World assumption on relations. OWL rejects this and assumes that your data about the relations or properties in any given model is always incomplete. Its reaction to issues that crop up with your modeling is to enforce logical consistency by adding "inferred" property instances to your data, as opposed to simply flagging the issue for validation. Real-world technologies like SHACL and ShEx work on very similar logical principles, such as description logic https://news.ycombinator.com/item?id=31890041 but avoid these pitfalls.