Hacker News new | ask | show | jobs
by fer 1678 days ago
This is my decision flow and I rarely have an issue:

    Are you the end user of the Python code?
        Yes -> Is available in your distro?
            Yes->Use package manager
            No->Use pip install in user mode
        No -> Create virtualenv with the Python version you want (including pypy!) and do your pip thing there

Some extreme use cases may benefit from anaconda, but personally I've never needed to use it. My only pain point is dealing with legacy code that relies on PYTHONPATH. Nothing good ever starts setting PYTHONPATH.
3 comments

> Create virtualenv with the Python version you want (including pypy!) and do your pip thing there

You might benefit from using Pipx in this case: https://pypa.github.io/pipx/

Pipx is good for the case of "I want to run a standalone Python application that is available through Pip, but not my system's package repo." This is a more common case than you might think.

It's a sensible alternative to `pip install --user`, and having self-contained deps for tool is a bit like `npm install --global` or even `volta install`.

Yup, this kinda works.

It doesn't address the greater issue tho: that it's getting harder and harder for distributions to package things right, and provider packages for their users (evidenced by the fact that you need a second package manager just for python stuff).

How is Pip any different than Cpanm, NPM, Gem, Luarocks, Nimble, Go's thing, Cargo, whatever JVM people use, whatever Haskell people use, etc. in that regard?

Distros have a hard job, but at the same time programming language tooling devs have more "customers" than just distro maintainers.

> Use pip install in user mode

This is a great recipe for disaster. Whatever you install in user mode will shadow anything installed system-wide, so when you try to run some system-wide project, it may now fail. I'm also not a fan of how it drops scripts into `./.local/bin`, since that's where I keep my own script, and is version controlled.

The installation will also be frozen and never get updated -- unless you remember to do it manually.

Finally, and worst of all, this leaves you in a dead end if your packages have conflicting dependencies, which is too often the case in Python-land.

So you're suggesting always using virtualenv?

I used to just use pip to install to the system. Months/years later I would try to untangle the mess of packages I was just playing with, what the OS wanted/needed, I got those conflicting dependencies you mention, etc. I usually ended up reinstalling the OS. At the time I may not have been as knowledgeable about where the OS package manager keeps packages vs pip--but the whole thing wasn't very user-friendly either.

For years I've been installing into user knowing I can just blow it away. I've dabbled with virtualenv, but it's such a pain to set up and activate. If I have a few projects with similar libraries it's more of a pain to set them all up and switch around. If I end up using a script for something important, I just spend the extra time at that point to "package" it.

This is one of the reasons people use Anaconda/miniconda for non-data science work: conda environments are self-contained Python installs, so if you conda/pip install packages into those environments, they will not break each other. This design requirement arose from the specific needs of numerical computing (which always drags in a ton of system-level C/C++/FORTRAN dependencies), but is a generically useful design construct.

Anaconda is a distro, and conda is a package manager, that works across OS platforms and hardware architectures, and installs cleanly into userland without requiring admin privileges. The only way we achieve this difficult goal is by creating a distro and build system that creates "portable" packages that can be relocated/relinked at install-time.

Ultimately, Python's challenges in this department come from the fact that it has such great integration with low-level C/C++ libraries. This gives it super powers as duct tape/glue language, but it also drags it down into the packaging tech debt of C/C++. Hmm... maybe I should write that blog post: "Python Packaging Isn't The Problem; C/C++ Is." :-)

I was slow to get to grips with venv. It sounds like you are on the same path. This note tries to be constructive advice -

* Some distro software uses python. Let the package manager take care of dependencies for that.

* For everything else, use an dedicated virtualenv for each codebase you are working with.

   > I used to just use pip to install to the system
Never do this, for the reasons you cite.

   > I've dabbled with virtualenv, but it's such a pain to set up and activate
Setup for virtualenv: "python3 -B -m venv venv". Have a shell alias 'alias v=". venv/bin/activate"' that allows you to activate it if you need to install libraries or access a shell. "pip install blah" for library install. That should be all you need.

   > If I have a few projects with similar libraries
   > it's more of a pain to set them all up and switch around
Have a think about why you feel this way, and whether you could mitigate the problems.

Here is what I do. Once my libraries are installed for the current project, I rarely activate venv in the current shell. Rather, for each python project, I have a bash script "app" in the root of the project, and a dedicated "venv" directory.

The app script does the following: (1) sources the local venv; (2) does pip freeze > requirements.txt to capture any dependency changes; (3) launches the project. Often I will have multiple launchers in that script, with all of them commented out except for the active one. Be in a habit of always launching from that app script.

To reiterate the approach above, whenever you sit down to write some python code, ensure that you have a dedicated venv for it, and that you are only ever launching code from that local venv.

I have spoken to developers who get upset at the extra hard disk overhead. You don't need to optimise for hard disk usage. Hard disk space is almost free.

I don't bother creating setup.py files, except for the odd occasion that I want to publish code to pip. Good luck.

Thanks for taking the time to share!

That's sounds like the general approach I take for "projects" even toy projects. My day jobs have never fit the virtualenv use-case. So at home I often have to look up how to use it. It's so rare that when I make an alias I even forget those.

Most new things are one-off scripts; move or rename some files, extract data from something, or pull from a resource. Something that requires libraries or is too big for a shell script. For example, the last one I see in my bin is a web scraper for appointments. It pulls a website, fills out a form, and gets the result a few times--about 70 lines. What's annoying is sourcing some environment just to run this one tool.

Most people have a directory of scripts (a mix of shell, Perl, or Python) they use if they spend a lot of time at a commandline. It's quite a pain to source the environment just to run a quick script. That's generally the libraries I install into user. I don't care much about the version and troubleshoot things as they come up.

Hm, why? I'm a happy user of PYTHONPATH!
It's completely global, shared by all Python interpreters of all versions.

I set PYTHONPATH, but the code in that directory is solely small debugging utils of mine that I want available in every Python interpreter, and I make sure not to put anything more complex in there.

"It's completely global"

It doesn't have to be. You can have a launcher for a project that sets PYTHONPATH just when you launch that project.

What is bad practice is to be setting PYTHONPATH in your .bashrc, and for the reason you give - that makes it global across python launches.

That'll prevent it leaking to most things, but not to subprocesses of your application.

For example your application might interact with command-line tools written in Python, and unless you delete PYTHONPATH from the environment variables prior to launching any subprocesses, they'll inherit it. This could lead to subtle and confusing breakage.

So is PATH and LD_LIBRARY_PATH. You just change those as you need to...
Alright. I actually set it in a Docker container. It works well there.
The only justified situation I can find is when you are working on two (or more) independent components at the same time.

My pain point in particular with PYTHONPATH (or playing with sys.path) is that people tend to use it with the only purpose of making import lines shorter, which brings naming collisions of all sorts when you aren't creative enough.

Yeah, how else do you git clone some random package and immediately use it without "installing" it?

PYTHONPATH is simple and obvious how to use, and is similar to using LD_LIBRARY_PATH and friends.