Hacker News new | ask | show | jobs
by occoder 1478 days ago
Never mind less slow, how about making it work first?

I'm in a disprivileged location, and it seems pip can only download from pythonhosted.org at a rate of 10-20 kB/s. Worse still, pip downloads timeout and fail extremely quickly.

If I rerun the pip install, instead of resuming the download, it will download the file from the beginning, then timeout and fail again somewhere in the middle.

I tried going through a private VPN hosted on Linode, with similar results.

For the official Python package manager, this is simply unreliable and unacceptable behavior.

3 comments

When in Africa, I worked around that problem and many others using a pypi proxy. Today devpi is the standard: https://devpi.net/docs/devpi/devpi/stable/%2Bd/index.html

You pip install from it instead of pypi. It will in turn download from pypi and give the result to you. It will also keep itself updated, and you can batch download during the nigh packages you assume you will work.

As a result, our entire team always as most packages locally available. Changing machine, location or purging cache didn't mean loosing this benefit.

Besides, pip caching wheels doesn't mean it's not making any requests, so it's still a better experience.

For big companies, I recommend it anyway: it speeds up the entire team work, CI, allow you to publish private packages, etc.

If you can't do that, the next best things is to do "pip download", instead of "pip install", and save the wheels into a hard drive.

Thanks for the tips, I'll definitely look into them.

EDIT: And thanks for the tip from godmode2019. I upvoted you both.

However, they don't change the fact that pip by default assumes a decent Internet connection, with its short timeout and no resume on downloads, and thus is unusable with anything less.

Again, IMHO that's simply unacceptable for the official Python package manager.

Unfortunately this is the the norm of most package managers these days, and I feel sorry pip can’t make the out-of-the-box experience more pleasant for you. If you got ideas how pip can improve the experience without compromising the zero-config default (which a lot of Python newcomers rely on), please definitely feel free to suggest them; we are all ears.

I would say that Python, and especially pip, is actually relatively friendly to scenarios where non-obstructed Internet is not generally available, compared to many other offerings from other programming languages (especially those without corporate backing). Devpi was already mentioned as a solution; in fact, since pip’s --index-url accepts a file:// URL, you can even simply pre-download the wheels, arrange them in the correct hierarchy (see PEP 503) inside a thumb drive, and just pass that around. The --find-links --no-index combination may also be an interesting approach for simple setups. There are a lot of things to try, before you conclude things do not work.

I do have a few suggestions, starting from least effort for most impact:

1. AFAICT pythonhosted.org is hosted at fastly.net, which seems to be heavily throttling downloads from certain disprivileged locations, as well as downloads from cloud providers such as Linode. It would help a lot if they could ease up a little on the aggressive throttling.

2. Make the pip download timeout longer to better accommodate spotty connections.

3. Make pip downloads resumable so that a download makes progress each time pip install is run.

4. Make pip download each file over multiple HTTP connections in parallel. Download throttling applies to a single connection, so downloading over multiple connections will speedup the download.

Thanks a lot for listening!

I've had ~5 mbps download speeds for the last 5 years in a rural US location. I set up devpi on an old Raspberry Pi hoping to have this same experience, but after several months I disabled it and found it was actually slowing things down. If memory serve, the issue was that the local package lookups were excruciatingly slow (even if done in localhost), and I was surprised to see that devpi would routinely soak up multiple GB of memory and start swapping (I recall a few issues on memory leaks in their repo).

Seemed like a great idea, and perhaps I just needed something beefier than an RPi 3, but it didn't work out for me.

You can change the pypi index URL with `pip config set global.index-url`.

For example, as a Chinese, I often switch to `https://pypi.tuna.tsinghua.edu.cn/simple`. You may need to look up which one is available and faster in your region.

Thank you! This tip may just be the cure.

But I shouldn't need to know this, pip should have taken care of picking the best download mirror.

Or it could just support resuming a previous download. I don't actually mind waiting for a slow download or having to rerun pip install a few times, as long as it makes progress each time I run it.

People should be aware that adding additional mirrors (instead of switching mirrors as suggested here) can lead to big slowdowns, as pip has no way to determine whether a given / found wheel is the m best choice without comparing all options, so it will continue to search both locally and all available indices until it can resolve the "best" wheel to download.

Some discussion: https://discuss.python.org/t/why-does-pip-reach-out-to-indic...

I don't recall if this issue is solved by specifying hashes (or perhaps even pinned versions are adequate) -- I would hope so.

Note that in this instance the user is setting index-url, so no additional indexes are added, but the default index is replaced. So as long as the replaced index is faster, it will result in a faster install. It’s the extra-index-url option that should be used with caution.
Indeed! Which is why I specified:

> adding additional mirrors (instead of switching mirrors as suggested here)

Pip can download from github

Pip install git+<repo_url>