Hacker News new | ask | show | jobs
by dredmorbius 322 days ago
Having studied, and attempted to build, a few taxonomies / information hierarchies myself (a fraught endeavour, perhaps information is not in fact hierarchical? (Blasphemy!!!)), I'm wondering how stable the present organisational schema will prove, and how future migrations might be handled.

(Whether for this or comparable projects.)

<https://en.wikipedia.org/wiki/Taxonomy>

<https://en.wikipedia.org/wiki/Library_classification>

2 comments

Clay Shirky's essay from 2005: Ontology is overrrated (centred on Yahoo!'s directory of links, oddly enough)

https://web.archive.org/web/20191117161738/http://shirky.com...

Unexpectedly related to the problem of perfect classification is McGilchrist’s The Master and His Emissary. It shows that human mind is a duet where each part exhibits a different mode of attending to reality: one seeks patterns and classifies, while the other experiences reality as indivisible whole. The former is impossible to do “correctly”[0]; the latter is impossible to communicate.

(As a bit of meta, one would notice how in making this argument it itself has to use the classifying approach, but that does not defeat the point and is rather more of a pre-requisite for communicating it.)

Notably, the classifying mode was shown in other animals (as this is common to probably every creature with two eyes and a brain) to engage when seeking food or interacting with friendly creatures. This highlights its ultimate purposes—consumption and communication, not truth.

In a healthy human both parts act in tandem by selectively inhibiting each other; I believe in later sections he goes a bit into the dangers of over-prioritizing exclusively the classifying part all the time.

Due to the unattainability of comprehensive and lossless classification, presenting information in ways that allows for coexistence of different competing taxonomies (e.g., tagging) is perhaps a worthy compromise: it still serves the communication requirement, but without locking into a local optimum.

[0] I don’t recall off the top of my head exactly how Iain gets there (there is plenty of material), but similar arguments were made elsewhere—e.g., Clay Shirky’s points about the inherent lossiness of any ontology and the impossible requirement to be capable of mind reading and fortune telling, or I personally would extrapolate a point from the incompleteness theorem: we cannot pick apart and formally classify a system which we ourselves are part of in a way that is complete and provably correct.

Yes, the seeming hierarchy in information is bit shallow. Yahoo, Altavista and others tried this and it became unmanageable soon. Google realized that keywords and page-raking is the way to go. I think keywords are sort of same as a dimensions in multi-dimensional embeddings.

Information, is basically is about relating something to other known things. A closer relation is being interpreted as location proximity in a taxonomy space.

Keywords also have their limitations.

The US Library of Congress is an interesting case study to my mind. The original classification scheme came from Thomas Jefferson's private library (he donated the collection to the US Government after the original Library of Congress was burned in 1812. The classification has been made more detailed (though so far as I know the original 20 alphabetic top-level classes remain as Jefferson established them), and there's been considerable re-adjustment, as knowledge, mores, and the world around us have changed. The classification has its warts, but it's also very much a living process, something I feel is greatly underappreciated.

At the same time, the Library also has its equivalent of keywords, the Library of Congress Subject Headings. Whilst a book or work will have one and only one Classification assigned to it (the Classification serving essentially as an index and retrieval key), there may be multiple Subject Headings given (though typically only a few, say 3--6 for a given work). These are used to cross-reference works within the subject index.

The Subject Headings themselves date to 1898, and there is in fact an article on the ... er ... subject, "The LCSH Century: A Brief History of the Library of Congress Subject Headings, and Introduction to the Centennial Essays" (2009), I'm just learning as I write this comment:

<https://www.tandfonline.com/doi/abs/10.1300/J104v29n01_01>

I think something similar was tried on everything2.com back in the day (2000ish).