|
|
|
|
|
by camel-cdr
348 days ago
|
|
> All digitized books ever written/encoded compress to a few TB. I tied to estimate how much data this actually is: # annas archive stats
papers = 105714890
books = 52670695
# word count estimates
avrg_words_per_paper = 10000
avrg_words_per_book = 100000
words = (papers*avrg_words_per_paper + books*avrg_words_per_book )
# quick text of 27 million words from a few books
sample_words = 27809550
sample_bytes = 158824661
sample_bytes_comp = 28839837 # using zpaq -m5
bytes_per_word = sample_bytes/sample_words
byte_comp_ratio = sample_bytes_comp/sample_bytes
word_comp_ratio = bytes_per_word*byte_comp_ratio
print("total:", words*bytes_per_word*1e-12, "TB") # total: 30.10238345855199 TB
print("compressed:", words*word_comp_ratio*1e-12, "TB") # compressed: 5.466077036085319 TB
So uncompressed ~30 TB and compressed ~5.5 TB of data.That fits on three 2TB micro SD cards, which you could buy for a total of 750$ from SanDisk. |
|