Hacker News new | ask | show | jobs
by zahlman 636 days ago
The issue there is how much gets downloaded from PyPI, not how much storage space it takes up. Making an archive copy of all of PyPI would only require a handful of ordinary drives. But the most popular packages get on the order of ten million daily downloads each.

When someone downloads and installs a package, it's generally a single wheel or sdist, out of a potentially massive version/platform matrix for that project. That inflates the storage cost, but it isn't why the bandwidth requirements are so high. A ton of CI systems are apparently poorly designed, individual wheels are bloated, and we can't use the best available compression. Those are the biggest issues.