Hacker News new | ask | show | jobs
by Tipewryter 1804 days ago
I prefer containers over pyenv and poetry. This way not only python version and dependencies are "in one place" but also all other stuff that comes along with a new project. The OS, the database etc.

The one thing I dislike about Python projects is that Python plasters the compile cache files all over the place. Is there a reason to change that? Currently I use the -B flag for all my scripts. But that makes it slow. I wish Python would have an option to perform like PHP and keep cached compilations in memory instead on disk. Or at least somewhere in /tmp/.

5 comments

Poetry is a great way to manage library dependencies in containerized apps
Why would I need poetry? Doesn't "pip3 install -r requirements.txt" do everything I need?
Pip is fine, it depends on your goals. I've found requirements.txt less enjoyable to maintain for several reasons – you need to separate dev, test dependencies on your own time, there's no notion of a lockfile for transitive dependencies (`pip freeze` notably doesn't separate actual dependencies from transitive dependencies). pip is also darn slow at installing dependencies once you hit a certain scale, and poetry outperforms it pretty substantially.

Poetry does I expect a package manager to do, and does it well, especially when working with a team of developers on an application versus individually. There's not a compelling reason for me to use pip directly as a less functional alternative.

Additional requirement files for dev and test don't seem like a burden to me.

Can you describe an issue that you had by not locking transitive dependencies?

Bit rot, "it works on my machine"-style issues, cache misses on dependency installation (which can really bloat deploy times in deploy pipelines by busting Docker caches across machines, too). Can be a security issue if a vulnerable library version is pushed and one installs it as a consequence of having non-locked dependencies, especially in python where package install scripts have a lot of power.

Lock files help solve for these. You can build software without solving them, but it makes my life easier.

All of this. Plus picking up a legacy project from someone with a giant requirements file and then trying to pick through and work out what we actually want locked and what's been installed by something deep in a dependency tree is a nightmare. Even if you don't use poetry for your own sake, use it for everyone else's.
Good question! From a template repo commit at work[1]:

Advantages:

- Separates development and production dependencies.

- The dependency version is specified separately from the lock file. In practice this means that the version in pyproject.toml generally only needs to be set to anything other than asterisk if and when it becomes necessary to use a specific version range.

- The lock file includes SHA-256 checksums by default, and these are checked during installation.

Disadvantages:

- More complex configuration than Pip.

- Python package managers come and go, and this one is likely going to suffer the same fate eventually.

- Introduces poetry.toml simply to specify that the virtualenv should be in the project directory. The default is to put virtualenvs in ~/.poetry, which is a non-standard location and therefore might interfere with typical IDE setups, mounting the virtualenv in containers or VMs, and the like.

[1] https://github.com/linz/template-python-hello-world/pull/106...*

> The dependency version is specified separately from the lock file.

That. The simple fact that a Pip file mixes both the packages you want and the dependencies required by this package, is a valid reason to switch to Poetry IMO.

Yeah, I don't think I've created a single venv in the last 2 years. I don't need them for basic stuff (e.g. a quick script) and for anything more I'd rather have a container so I can deal with all dependencies in one place and use it elsewhere quicker if I need to.
How do you make sure your dependencies are not tampered with? https://medium.com/@alex.birsan/dependency-confusion-4a5d60f...
That is a very broad question. Can you mention a specific attack vector? Then I might be able to explain how I do or do not avoid it.
The link describes the attack vector. pipenv locks the dependencies using hash. if you company has my-company-py-lib then pip could install public library that pretends to be internal.
yes, you can set PYTHONDONTWRITEBYTECODE=1 in your environment but it is equivalent to -B, I think.
If you need a specific OS then isn't a VM the solution rather than a container?

Also at what point do people just realize that all of this overhead is a gigantic waste of time and just use a better language?

A Docker container starts in two seconds or so. And gives me everything I need. So no need to dabble with a VM.

There is not much overhead in running a project in a container. The project has a setup file that turns a fresh Debian 10 into whatever environment it needs. And thats it. Run that setup script in your Dockerfile to create a container and you are all set. Want to run the project in a VM or on bare metal? Just install Debian 10, run the setup script and you all set.

> Also at what point do people just realize that all of this overhead is a gigantic waste of time and just use a better language?

Probably some time shortly after your developer time costs less than your cloud compute time. Until you hit that point (if ever) there are few options as cost-effective as Python.

In any language ever if you use non-vendored shared libs you will hit this problem. Certainly not specific to Python, in fact the reason package managers on *nix are necessary (and not just a nice to have) is because of this.
Yeah to be honest if you need containers to make a project reproducible this is just a sign of failure. You're basically saying you need to encapsulate the entire system for your code to run correctly.
Loads of tools have external dependencies that are hard dependencies. And library search paths is my machine vary from those on prod... I could go on, but I'm not sure I understand why using a container to manage all that is a failure?
There are plenty of reasons why you might want to containerize a project. If you have a lot of system dependencies, for example, you might want to consider including a Dockerfile in your project to make it portable.

However what makes Python a failure is that people feel they need this to dependably run a python program which only has pure-python dependencies.

Compare this to a language like Rust, or the NPM ecosystem. In those cases, the tools have managed to dependably encapsulate projects such that you only need the package manager to make a project fully repeatable.

With either of those ecosystems, there's basically one system dependency, and you can find any repository online and dependably do `git clone ...` then `cargo build` etc. to make it work. With Python, you effectively have to reproduce the original developer's system, and that is a failure.

Huh? Either something is really weird about your env or we have different ideas about what counts as a pure Python package.

Because if you don’t rely on Python packages with extensions that farm out to external libs it’s as easy as git clone, pyenv virtualenv, pip install -r, and python -m build.

The think that makes this worse than other ecosystems is:

1. virtualenv shouldn't be necessary. This is more or less the same concept as containerization. This is only needed because python has a fractured ecosystem, and setting up your environment for one project can break another.

2. you also have to know which environment encapsulation and package management solution the library author is using - this is not standardized