Hacker News new | ask | show | jobs
by guessmyname 963 days ago
I’ve noticed that I never quite feel at ease with the Python programs I write.

I’ve been using Go to create projects, both big and small, since 2013.

Almost every time I attempt to build something even remotely complex with Python, I end up regretting it, especially when other people besides myself start using these programs. The main problem is the lack of assurance that the same program will function correctly on another person’s computer. With Go programs, it’s as simple as having a statically linked binary, and given the ease of cross-compilation, I’m very confident that what works on my machine will work on my coworker's or customer's computer as well.

You know how some people suggest that Shell scripts should not exceed a certain number of lines, because beyond that point, it’s better to create a Python, Ruby, PHP, or similar script? I experience a similar sentiment when working with Python. A few hundred lines may be acceptable, but anything larger than that, I believe, is better suited to be written in a compiled language.

12 comments

I feel the same way.

Python has been my goto language for a long time, but lately I've been noticing that I've been holding off on writing new tools with it because on the back of my mind I have this nagging feeling that making them robust and portable will take too much work—and so I don't even bother getting started.

It's this trap of yes you get to ~99% pretty fast, but the last 1% (packaging/distribution) then take forever.

But I'm still looking for a good alternative... Golang does the job—no question, but it doesn't spark joy for me.

While there is definitely a higher barrier to entry, once I got comfortable with Rust (and finally stole someones working cross-compile / publish github actions for it) it has surplanted Golang in this use case because it does spark joy for me.
My rule of thumb used to be shell scripts past 100 lines get converted to Python, and Python scripts past 1000 lines should get converted to something else. But in practice, the Python has stayed almost always.
I think simple shell scripts are usually more terse than python.

But as a shell script grows, python starts winning.

By the time you get to 1000 lines of python, you are probably doing a lot of heavy lifting and it is probably non-trivial to change languages.

My shell-to-python heuristic is similar, though I'll write longer shell scripts if I find I need to run a lot of subprocesses (it's just unwieldy in python) and I'll write shorter python scripts if I have to do logic best expressed with objects, tuples, hashtables etc. (Technically bash has everything you need, but I would prefer not to).

Of course, there are languages like Ruby and Perl that would cover both bases pretty well, but I'm not willing to introduce a third scripting language to most teams and projects I work on. Not to mention that those languages have their own issues.

I know Python since version 1.6, and Go is such a downgrade in productivity that I would only use it when not given an option, like on some DevOps tools.

As someone that has experience with static binaries since 1990, way before dynamic loading was a common option in modern computing, yeah it works on the other computer, provided the distribution is exactly the same, and all required files and network configurations are exactly the same.

I can't say I can relate at all. If you do things from scratch that might be true, but there is a pretty popular python tool called cookiecutter that allows you to generate the basic skeleton of the app. I usually pick something that contains poetry, click(I guess there is typed now) and some linting choices.

For fun I just googled a template and tried: https://github.com/radix-ai/poetry-cookiecutter

And the result is quite good.

Your comment assumes that python cli scripts need to be single liners, but IIRC there are several tools that allow you to bundle a package into a single file like pex, shiv, and zipapp.

And it offers awful templates. Basically, everything it generates is wrong.

But such is the reality of Python world. Every third-party library or tool you use is defective in some major and plenty of minor ways. And you have to be prepared to undo, fix, reimplement whatever you get, and be very, very selective about the tools and libraries you choose to live with.

That actually look very good thanks !
There are packaging tools for Python, and if your tooling is targetting people already using Python, just relying on `pip` + writing a proper pyproject.toml is a good solution nowadays (protip for people with virtualenv issues: direnv solves so much of this it's not funny).

But I have been looking around for a while for something that's more certain than `pip`, and unfortunately everything I've found (like Bazel or Buck) suffers from having to do a lot of futzing to use dependencies.

Pip and pyproject.toml have no way of helping you to get scripts to your system.

Pip doesn't really know how to install programs. Pyproject.toml is completely irrelevant to the problem. What pip can do is install (generated) files from the scripts section of the Wheel it's installing into the directory for executables known to your Python environment. In most cases this directory will not be on system path, and even if it is, you are better of not using this functionality, instead you'd need to rely on tools from your system packaging to install files there, so that the system packaging tools can track them, deal with conflicts caused by upgrades / downgrades, remove them, audit them etc.

> virtualenv

Whoa, this fossil is still alive somewhere? I think, you probably meant venv. virtualenv is a throwback to the Python 2 era. Not that its bad because of that, but you should probably warn your readers about this detail.

> pip vs Bazel or Buck

Are you sure you understand what these tools are supposed to do? pip installs Python packages. Bazel and Buck build (mostly Java) packages. The analogue in Python world to Bazel and Buck would be SCons, maybe setuptools.

In other words, pip doesn't know how to build Python packages. Sometimes it wants to build them (which is bad, and you should never do that), but it never does it on its own -- it uses other tools to do that, and the tools could be anything, setuptools, CMake, MSVC, rustc... whatever the authors of that particular library chose to use to build it. In particular, pip could, in principle, call Bazel to build a package (would be a weird twist, but not impossible).

On the other hand, tools like Bazel or Buck would usually use something else to install packages, if those are needed during build, eg. Maven.

Do you actually use Python? No offense but most of what you’ve typed here makes no sense.
Must be doing it much more and much better than you do.

I've started my familiarity with Python after Peter Norvig promised that Python can be an OK substitute to Common Lisp. That promise turned out to be a bold-faced lie, but learning some Python made me more employable, so, I'm not complaining. I've made my first steps using Python when Twisted was popular, there were "old-style" and "new-style" classes and you could raise whatever you wanted (not necessary an exception), setup.py files were written in such a way as to use distutil if setuptools wasn't installed.

I've also contributed to CPython (reported bugs mostly). Wrote a bunch of C, some C++, Rust and Go code that produces Python modules as well as contributed to pip, setuptools, conda... Again, mostly bug reports or small patches for specific bugs, but still.

At day job, my role is in infrastructure, which is mostly written in Python, so, I deal with stuff like Linux kernel to userspace interface, various system utilities, or cloud-related stuff, mostly OpenStack. Another aspect I'm involved with at day job is CI and packaging. Perhaps the utility I wrote that's seen the most use is one that deals with combining multiple wheels into a single wheel to speed up deployment. It's not sophisticated, but turned out to be very useful. Another popular utility is used to dismantle Linux storage so that it can be re-defined and re-assembled. What it does is it traverses /sys/block looking for various devices and connections between them, finds the right order in which these devices need to be stopped / removed / disassembled and does that. Again, this isn't very exciting, but turned out to be useful.

What do you do?

Have you tried pipx?

I find it solves the installation problem really well: each installed program gets its own virtual environment, but the single binary is still added to your path.

> its own virtual environment

I would never want that. This is the exact opposite of what an installer should do. The whole point of using Python is to rely on the previously installed stuff: both the runtime and other libraries. If I'm making a Python program, I package it as a DEB or RPM (for work, this is what we support). I'm not saying these are great tools or are pleasant to work with, but I find the end result to be acceptable.

Similarly, I guess, I'd make an MSI or w/e is the modern way to install on MS Windows, if I ever have to. I don't know what's the equivalent is on other systems.

The separate environment thing ensures you get the exact versions of the libraries that are guaranteed to work with the tool, without risk of upgrading a library in a way that breaks something else.

I wonder if pipx and venv could grow functionality based around symlinks that allowed installations using the exact same package version to avoid having two copies of the files?

A compression based file system hack might provide a better result though, since it could optimize differences between two dependency versions with only minor changes.

> The separate environment thing ensures you get the exact versions of the libraries that are guaranteed to work with the tool,

1. I think it's the wrong goal.

2. It's easier to achieve that in ways that don't require bloat on my system.

It's much nicer for users if the library interfaces with its dependencies respecting the versioning rules: doesn't use undocumented, unreleased stuff, doesn't rely on undocumented side-effects etc. Only relies on what's tested and released. In this way, there's no need to be very selective about versions you have.

Unfortunately, this is not how the world is. The reality is that especially in popular environments like Python, you get a crapload of very low-quality libraries, with poorly defined dependencies, with people who don't understand the infra side of things and have convoluted dependency requirements. But, I usually try to fight back. If I absolutely have to have a library with convoluted requirements, I fork it and fix the nonsense. Or vendor it.

Another unfortunate quite ironic side-effect of this situation is that people are so dedicated to specifying nonsense requirements (eg. including patch in requirement specification even though Python doesn't even have a patching mechanism) is that popular programs used to install libraries are optimized for these absurdly specific requirements. I.e. it's faster to install requirements with pip or conda if you give them the exact list, preventing the solver from putting it much work. This puts people who want to make good libraries at a disadvantage because their libraries will take forever to install.

Which leads me to the following conclusion: if I want super-precise requirements, I don't need pip or conda. I can simply curl -o the packages I need, and it will be much faster and a lot more reliable.

> without risk of upgrading a library in a way that breaks something else.

I'm not afraid. I usually know what I'm installing. If it breaks, I'll fix it. I actually want to know when and why it breaks, so this is also an anti-feature for me.

> A compression based file system

Do you mean deduplication / CoW filesystem? Not sure why is this a hack. Compression in filesystems typically compresses individual blocks. It won't help you if different files have the same contents, the theoretical benefit comes from entropy within a file, not because some files share contents.

Using Python with Bazel is fairly common at big SV companies -- they use rules_python with it (https://github.com/bazelbuild/rules_python). It does rely on pip for grabbing dependencies but handles building modules and can integrates well with rules_docker/rules_oci for building container images from your code.
What's SV?

I honestly don't know why anyone would use that... as in what does Bazel do better than virtually anything else that can provide this functionality. But, I used to be an ops engineer in a big company which wanted everything to be Maven, regardless of whether it does it well or not. So we built and deployed with Maven a lot of weird and unrelated stuff.

Not impossible, but not anything I'd advise anyone to do on their free time.

Specifically wrt' the link you posted, if you look here: https://github.com/bazelbuild/rules_python/blob/main/python/... it says that only pure Python wheels are supported, but that's also a lie, they don't support half of the functionality of pure Python wheels.

So, definitely not worth using, since lots of functionality is simply not there.

SV - Silicon Valley.
pip will install dependencies transitively. Some of those dependencies or some version of those might be uninstallable on certain platforms and you won't even know!

Further, if I am building using Python 3.11 features and you are stuck on Python 3.10 then you cannot install my Python CLI tool.

Well if some dependencies are uninstall able on certain platforms another packaging technique won’t magically solve that!

I too would like it for things to just magically be good (I think pyinstaller seems kinda close but I dislike how it works based on scanning your code. PyOxidizer is another), but was just mentioning that pip is an alright distribution tool for a part of the population (one that uses Python)

How about converting it to Nix derivation?

https://github.com/nix-community/poetry2nix

> I experience a similar sentiment when working with Python. A few hundred lines may be acceptable, but anything larger than that, I believe, is better suited to be written in a compiled language.

Python, IMO, has no niche anymore. A few hundred lines of Python is a hundred lines of Zsh, or the same few hundred lines of C++, and to top it off, there's the shit show of Python tooling for deployment. setup.py, requirements.txt, pyproject.toml… Fifteen files with overlapping contents in twelve different grammars (mild exaggeration), with new ones added every other year. Setuptools can't find your entrypoint…

Fingers crossed for vlang[0]. It's like golang with better types and more syntactic sugar. Feels like a proper upgrade from Python.

I really hope they succeed.

[0]: https://vlang.io/

For me Python is addictive.

You know the tooling is bad and in the long term it will hurt, but the standard library and third party packages are just phenomenally productive and that’s a huge draw.

I was going to learn Python for the same reason: to create utilities that would run on most any computer. Mostly to do things like file-parsing and data-format conversion.

But the Python ecosystem seems to be such a disappointing mess that I just gave up on the whole idea. I'm learning JavaScript/TypeScript now and you can build CLI programs with Deno.

You don't need Deno if all you're doing is simple utilities for parsing data and making file format converters. The native browser runtime is more than capable on its own—and your users already have it installed; you don't need to bring another vendor's runtime into the equation just to run a JS program—few people are going to have Deno on their computer.

The part of the ecosystem that belongs to Node/Deno branch of the family tree also tends to promote bad practices (while insisting they're good practices), and that's before you get to the part where the runtimes themselves implement quirky/non-standard dialects and APIs. It's not a community that's known for being especially rational or having high standards for intellectual honesty.

If you really want to write stuff that will on most people's computers, target the World Wide Wruntime—write standard JS that the browser won't choke on. You can do it in a way that people are allowed to run it from the command-line if they want but doing so is optional. Here's a 7-part tutorial that explains how: <https://triplescripts.org/example/>

Thanks for that; I'll check it out. I was not talking about using a browser to run anything at all; strictly command-line utilities. (Update: I read much of the first few pages of that triple-script tutorial, and I definitely like the stated goals. Added to reading list!)

Deno has a way to package up the necessary JS runtime and make a self-contained executable. I'm sure it's bloated as hell, but again I don't want to require a browser.

Do you have any examples of said "bad practices" and non-standard dialects? I'm building a server with Deno right now to provide a REST-style API for a mobile app (nothing fancier than CRUD and some push notifications). The contenders for me were PHP 8 and Deno. Since I wanted to learn JavaScript anyway, I went with Deno. So far I've had a decent experience.

> Do you have any examples of said "bad practices" and non-standard dialects?

There's an inexhaustible list. But here are some:

- `require`, `module.exports`, and `.mjs`

- `Buffer`

- Abusing arrow functions and generally going out of one's way to reimplement `this`, poorly

- Closures everywhere (and near zero regard for runtime consequences, i.e. perf incl. memory usage, or legibility of code)

- Abusing `===` (i.e. using it everywhere and yelling at you if you don't—even going so far as to write codestyle bots and other tooling that forces you to change occurrences of `==` to `===` e.g. to get the build to succeed); lines where `===` is used instead of `==` should ideally make up something like less than one half of 1% of your code (generous), if it ever occurs at all

- A whole slew of "My First Experience with Polymorphism and Types™" antipatterns that are unwisely encouraged like `function foo(x) { if (typeof(x) == "string") /* ... */ }` and naive use (i.e. misuse) of `instanceof`, plus a bunch of packages like is-uint8array and/or basically the entirety of the (non-standard) utils.types namespace

Thanks for the reply. Several of these I haven't encountered yet, but I wholeheartedly agree that the uselessness of "==" and insistence on "===" is some amateur-hour junk.

Let's see, what else... yes, I don't see the point of arrow functions. And the reliance on RTTI is just straight-up bad programming in any language.

So what would you choose to write a server in? I'm writing a fairly straightforward server to present a REST-stye API and access a database for a mobile app. I'm doing all this alone, so presumably I'm going to have to rely on at least a few frameworks for "routing" and serialization because I don't think I have time to roll my own.

> the uselessness of "=="

Does not compute.

The npm package called "pkg" seems to be the standard for packaging NodeJS applications

https://www.npmjs.com/package/pkg

Unfortunately you also need to bundle all your code into a single file for it to work, but you can use any bundler (webpack, parcel, etc) you want at least

If you distribute any CLI tool you should include the runtime and any attached dependencies, but with dynamic languages that can easily put your distributable in the tens of megabytes in size which is a bit of a pain.

I mean for the longest time the AWS CLI used the python/pip installed in your own machine and it probably caused thousands of man-hours of wasted time.

The equivalent to static linking in Python would be bundling all code into an archive (including transitive dependencies), along with an interpreter. Some shell script can be used to unpack and run.

It's possible, just not the norm.

I wrote a tool once that would do healthchecks before doing anything it would format it in a lovely table.

It would clone repositories (microservices) and configure LXC containers.