|
|
|
|
|
by GHFigs
4932 days ago
|
|
I think you don't understand what you're saying. If you were to judge the contents of a library by the size of the largest items on the shelves (exactly as you have done with torrents), you would come away with the mistaken impression that they consisted primarily of dictionaries and boxed sets of language learning CDs. In fact, these items represent a very small portion of the items in the catalog. |
|
What's being counted as a single item here is not a single bound volume of a chemistry journal, nor the entire archive of Bioconjugate Chemistry, but rather the entire chemistry-journals wing of the library: 539 gibibytes, including 226 different journals. By comparison, the latest five items on http://webcache.googleusercontent.com/search?q=cache:http://... are 3.7MiB, 11.7MiB, 350MiB, 730MiB, and 260MiB; the chemistry-journals library is some 2000 times the size of the median of these and 120 000 times the size of the smallest, which happens to be a two-volume book called "Great Moments in Mathematics".
It turns out that when you have a power-law distribution crossing five orders of magnitude, like the one that characterizes file sizes, rather than the much narrower distribution that characterizes book sizes, you actually can get a useful approximation of the makeup of the total by looking at the makeup of only the largest items. It's surely not an unbiased estimator, but it's still a useful one.
Feel free to invest the work to do a better approximation.