| Bush's essay is of course a classic. There are some precursors -- there's a BBC interview of H.G. Wells describing something similar from the 1940s.[1] E.F. Forster's The Machine Stops has some similar ideas. And various encyclopaedists very much embodied similar ideals. I've been listening to Peter Adamson's "History of Philsophy Without Any Gaps" podcast, which is excellent, and spends a fair bit of time looking at the historiography of the topic -- what works were preserved, how, various interpretations, practices, preservation, and losses. Interesting to note that most of the preserved Greek and Roman works were found in obscure Arabian monastaries and libraries. The mainstream collections themselves were often lost in raids, fires, or other mishaps. Which makes the LibGen situation all the more relevant and urgent. (I'm a huge user of the site and others like it, for what it's worth.) On the amount of total data being captured: there's a huge difference between quantity and quality measures of information. They're almost certainly inversely related. Of what books were written in antiquity, up to the time of the printing press, say, odds were fairly strong that a work would be read. At 1 million new titles being published per year, there are only 330 people in the US per book, or roughly 400 native English speakers worldwide. (With ~2 billion speakers worldwide, the total audience might reach 2,000 per book). Clearly, most of what's being written will have a very small, or no, audience. For machine-captured data, the likelihood that any of it is seen directly by a human is vanishingly small. More of it will undergo some level of machine processing or interpretation, though even that only applies to a fairly small fraction of data. Insert old joke about the WORN drive: write once, read never. As for storage costs (and/or size), at a 15% cost reduction per year, storage halves every 4.67 years (4 years and 8 months), which means that in 10 years, the $10k price tag becomes $2k, and in 20 years, it should be under $400. For the entire Library of Congress collection. Flash drives seem to be increasing in capacity by a factor of 10 every 2.5 years. There are now 2 TB flash drives, so 200 TB might be as little as 5 years out. That ... still sounds optimistic to me. https://m.eet.com/media/1171702/digital_storage_in_consumer_... https://www.digitaltrends.com/computing/largest-flash-drives... The more practical problems are simply organising, cataloguing, and accessing the archives. This is an area that still needs help. ________________________________ Notes: 1. I think that's from "Science and the Citizen*, 1943, though the BBC and I have a disagreement concerning access. https://www.bbc.co.uk/archive/hg-wells--science-and-the-citi... |
"Among some excellent men, there were some weak, average, and absolutely bad ones. From this mixture in the publication, we find the draft of a schoolboy next to a masterpiece." — Denis Diderot
Taking the quote out of context (and aside from its historical male-centered language) - it sure rings true of the current state of the web, as well as books.
About the inverse relationship of quantity vs quality, we seem to be drowning in quantity! As you've pointed out, there's great need for thoughtful organization and curation.
I like how you break down the quantifiable aspects to draw a historical trend and future projection. The rise of "data science" and "big data" in the past few decades really makes sense in this light.
I'm sure machine learning and "AI" will play an increasing role in the task of organizing and processing all this information, but at the bottom I feel that the most value probably comes from human curation.
LibGen has been an amazing resource for me as a lover of knowledge, a life-long book worm. I've got bookshelves and boxes full of physical books as well, but it's a drop in the ocean..