|
|
|
|
|
by sixtyfourbits
1750 days ago
|
|
It's all PDF files, which have their own compression, so it's unlikely there would be substantial gain from additional compression. Each torrent has 100 zip files, and each zip file has 1000 PDFs, but the files are stored uncompressed within the zips (i.e. using the STORE method). |
|
You could write a custom compressor that decompiles journal PDFs to valid TeX, then compresses that.
Or at the simpler end of what's technologically possible, you could at least extract shared assets such as fonts that appear in multiple files. Keep files from the same journal together to find more overlaps.
I suspect there's quite a large gain to be had from further compression, at least theoretically. Even more if you could accept some level of non-semantic loss.