Hacker News new | ask | show | jobs
by lioeters 2390 days ago
Thank you for that, very interesting and educational. I love how you led up to the punchline. It made me see that books as a technology and artifact are part of the "history of information", and how books are becoming subsumed in a shared trajectory with media/data in general.

> half of all the recorded information of humankind was created in the past two years

That is shocking to imagine, and it's exponentially growing.

It reminds me of Vannevar Bush's "As We May Think", pointing out the emerging information overload in society. It certainly puts things in perspective, how we (humanity) have been making a conscious, collaborative effort to develop globally networked computers, one of whose important functions is to help us organize all the information, including books.

The conundrum it seems is that technology is also a massive multiplier/amplifier of the amount of data, that its capacity to help us organize would never catch up to what it's helping to produce.

> total storage for the 38 million volumes of the Library of Congress would be slightly under 200 TB

I guess it's redundant to say, but I'm sure in the near future that would fit on a thumb drive!

1 comments

Bush's essay is of course a classic. There are some precursors -- there's a BBC interview of H.G. Wells describing something similar from the 1940s.[1] E.F. Forster's The Machine Stops has some similar ideas. And various encyclopaedists very much embodied similar ideals.

I've been listening to Peter Adamson's "History of Philsophy Without Any Gaps" podcast, which is excellent, and spends a fair bit of time looking at the historiography of the topic -- what works were preserved, how, various interpretations, practices, preservation, and losses. Interesting to note that most of the preserved Greek and Roman works were found in obscure Arabian monastaries and libraries. The mainstream collections themselves were often lost in raids, fires, or other mishaps. Which makes the LibGen situation all the more relevant and urgent.

(I'm a huge user of the site and others like it, for what it's worth.)

On the amount of total data being captured: there's a huge difference between quantity and quality measures of information. They're almost certainly inversely related.

Of what books were written in antiquity, up to the time of the printing press, say, odds were fairly strong that a work would be read.

At 1 million new titles being published per year, there are only 330 people in the US per book, or roughly 400 native English speakers worldwide. (With ~2 billion speakers worldwide, the total audience might reach 2,000 per book). Clearly, most of what's being written will have a very small, or no, audience.

For machine-captured data, the likelihood that any of it is seen directly by a human is vanishingly small. More of it will undergo some level of machine processing or interpretation, though even that only applies to a fairly small fraction of data. Insert old joke about the WORN drive: write once, read never.

As for storage costs (and/or size), at a 15% cost reduction per year, storage halves every 4.67 years (4 years and 8 months), which means that in 10 years, the $10k price tag becomes $2k, and in 20 years, it should be under $400. For the entire Library of Congress collection.

Flash drives seem to be increasing in capacity by a factor of 10 every 2.5 years. There are now 2 TB flash drives, so 200 TB might be as little as 5 years out. That ... still sounds optimistic to me.

https://m.eet.com/media/1171702/digital_storage_in_consumer_...

https://www.digitaltrends.com/computing/largest-flash-drives...

The more practical problems are simply organising, cataloguing, and accessing the archives. This is an area that still needs help.

________________________________

Notes:

1. I think that's from "Science and the Citizen*, 1943, though the BBC and I have a disagreement concerning access. https://www.bbc.co.uk/archive/hg-wells--science-and-the-citi...

While brushing up on the encyclopaedists, I found this little gem:

"Among some excellent men, there were some weak, average, and absolutely bad ones. From this mixture in the publication, we find the draft of a schoolboy next to a masterpiece." — Denis Diderot

Taking the quote out of context (and aside from its historical male-centered language) - it sure rings true of the current state of the web, as well as books.

About the inverse relationship of quantity vs quality, we seem to be drowning in quantity! As you've pointed out, there's great need for thoughtful organization and curation.

I like how you break down the quantifiable aspects to draw a historical trend and future projection. The rise of "data science" and "big data" in the past few decades really makes sense in this light.

I'm sure machine learning and "AI" will play an increasing role in the task of organizing and processing all this information, but at the bottom I feel that the most value probably comes from human curation.

LibGen has been an amazing resource for me as a lover of knowledge, a life-long book worm. I've got bookshelves and boxes full of physical books as well, but it's a drop in the ocean..

I love the Diderot quote. I'd also encountered earlier:

"As long as the centuries continue to unfold, the number of books will grow continually, and one can predict that a time will come when it will be almost as difficult to learn anything from books as from the direct study of the whole universe. It will be almost as convenient to search for some bit of truth concealed in nature as it will be to find it hidden away in an immense multitude of bound volumes. When that time comes, a project, until then neglected because the need for it was not felt, will have to be undertaken...."

... and on for another several paragraphs. It's an extraordinarily keen observation on the state and future of knowledge. At the always excellent History of Information website:

http://www.historyofinformation.com/detail.php?entryid=2877

(Diderot is on my list of authors to explore in more depth.)

The fact that the quality of any given information or exchange is often (though not always) entirely divorced from its source (or author) is another interesting note. There are a few points here worth expanding on.

At least probabalistically, there are spaces (real or virtual) in which it's more likely to encounter good ideas. HN for its various failings, does well in today's Net. Google+, for all its faults, was similarly useful.

Size matters far less than selection. The tendency for centres of learning, research, and/or inquiry (and not necessarily in that order) to emerge is one that's been long observed, and their durability remarkable. The first universities (Bologna, Padua, Oxford, Paris, Cambridge, Heidelberg, and others, see: https://en.wikipedia.org/wiki/Medieval_university) are often still, 600 - 700 years later among the best in the world. Certainly in the US, Harvard, Yale, Princeton, M.I.T., among the earliest founded, remain the most prestigious. Though as noted in the conversation with Tyler Cowen and Patrick Collison, the list from 1920 is "completely the same, except we’ve added on California".

https://conversationswithtyler.com/episodes/mark-zuckerberg-...

What happens as the overal quantity and flux of information increases is that more effective rejection systems are required. That is: you've got too much information flowing in, you want a way to cheaply, with minimal effort or consequential residiual load, reject information that may be irrelevant, with minimal bias.

There are numerous systems that have been arrived at, and many of our cognitive biases or informal tests for truth arise out of these (optimism, pessimism, availability, sunk-cost, tradition, popularity, socio-ethnic prejudice, etc.). Randomised methods are probably far fairer and less prone to category error. Michael Schulson's sortition essay in Aeon remains among the best articles I've read in the past decade, if not several:

"If You Can't Choose Wisely, Choose Randomly"

https://aeon.co/essays/if-you-can-t-choose-wisely-choose-ran...

Another fundamental problem is self-dealing and self-selection within institutions. Much of the failure within academia (also touched on by Cowen and Collison, who, I'll note, I don't generally agree with, though they are touching on and making many points I've been pursuing for some years) comes from the fact that it's internal selection of students, faculty, articles, topics, and ideologies, rather than strict tests of real-world validity, which promote these structures.

The same problems infect government and business -- it's not as if any one social domain is immune to this.

Oh, and another lecture by H.G. Wells on that topic:

"...When I go to see my government in Westminster I find presiding over it the Speaker in a wig and a costume of the time of Dean Swift, the procedure is in its essence very much the same. The Members debate bring motions and when they divide the art of counting still in governing bodies being in its infancy they crowd into lobbies and are counted just as a drover would have counted his sheep two thousand years ago...."

https://invidio.us/watch?v=qRgP-46AC_o

(Audio quality is exceptionally poor, 1931 recording.)

Partial transcript: http://www.aparchive.com/metadata/INTERVIEW-WITH-H-G-WELLS-S...

AI ... may be useful, but seems to be result-without-explanation, a possible new form of knowledge, to go with revelation (pervasive if not particularly acurate), technical (means), and scientific (causes / structural).

Wholehearted agreement on LibGen.

Very enjoyable conversation BTW, thank you.

Nature shows us how to process information at ever increasing noise and scale - https://www.edge.org/response-detail/10464
Yes and no.

Briefly: the article distinguishes "endocrinal" vs. "distributed" decisionmaking.

This applies at some levels, but not at others.

For individual humans, we don't have the option of rewiring our concsiousnesses, which are rather pathetically single-threaded, and can at best multitask poorly by task-switching, at a very great loss of task proficiency.

Even withing collective organisations (companies, governments, organisations, communities), the multiple independent actors works where those actors' actions are autonomous and independent of others. Or, in the alternative, where they work without mutual conflict toward a common goal.

But you get problems where either individual actors' motivations and actions are in conflict, or in which a single global decision must be made (as with various global catastrophic risks), and multiple independent decisions cannot be arrived at. Even for noncritical arbitrary decisions, such as which side of the road to drive on, in which there is no compelling argument to be made for one side or the other, but in which both sides cannot be simultaneously selected, you need some global decisionmaking capacity.

When you reach the point of either an existing decisionmaking system (as in: a single human, with the finite and largely immutable information acquisition and processing capabilities corresponding), or a multi-agent system which must reach a common decision, you've got the challenge of limiting data intake to that amount which allows effective function within the environment, and avoids overloading capabilities or ineffective action.

The article "Evolving the Global Brain" was thought-provoking, especially in the context of our discussion about the history of information and the exponentially increasing amount of information for humanity to gather/produce, process, curate, archive.

It's an attractive concept, that human society is structurally similar to a brain, and that an individual is a neuron. (If humanity is the brain, I suppose the rest of the Earth is the body. We're not doing too well as the self-appointed brain of the operation.)

My first reaction to the analogy of "endocrinal" (one-to-many) and "neural" (many-to-many) decision making, is that it's missing a primal psychological/biological motivation of humans to seek to dominate others of its own kind as well as all of nature. I'm not familiar enough with biology to say definitively, but I'm pretty sure the endocrinal system does not actively seek to subjugate the neural system (or vice versa) and dominate the whole body.

Social organization, it seems to me, is more a function of power, very small groups gaining advantage and dominance over vastly larger groups of people, than that of collaboration for mutual benefit. (I might be a bit too cynical of political motivations and authentic democracy these days.)

From the final paragraph:

> ..the current global brain is only tenuously linked to the organs of international power. Political, economic and military power remains insulated from the global brain, and powerful individuals can be expected to cling tightly to the endocrine model of control and information exchange.

I'd disagree with this, and say that the global brain (if we mean the Internet and its empowerment of globally networked intelligence) was born from the wombs of "political, economic and military power". It never achieved escape velocity to become a truly free, autonomous and collaborative, neural model of decision making.

To backtrack a bit:

> Well-connected collective entities like Google and Wikipedia will play the role of brainstem nuclei to which all other information nexuses must adapt.

The most powerfully well-connected collective entities are international political/financial/corporate entities, and indeed do they more or less dictate how all information nexuses (nexii?) must adapt.

One biological analogy that comes to mind, is how propaganda and "disinformation" act like neurotoxins in the social brain, introducing noise/entropy, skewing its coherence, and preventing well-informed and orchestrated cooperation.

Another is how established political powers have a well-developed "immune system", composed of mass media, legal structures, military/police force, surveillance of the public. This immune system could be seen at work, for example, at the environmental protests at the Standing Rock Indian Reservation.

The final sentence of the article:

> This formidable design task is left up to us.

By this I assume the author means, evolving the global brain. Quite a challenge! From my perspective, it's going to be a historic struggle: design or be designed.