Hacker News new | ask | show | jobs
by pansa2 1654 days ago
IIRC a big issue with PyInstaller is that the built executables are really self-extracting archives, which have to write many files to disk before they can run. Compared to a real compiled executable, it’s slow and inelegant.

Unfortunately high-quality bundling into executables just isn’t a focus of Python (nor of any other high-level language). Personally, I’ve gone back to C++ for building command-line apps - as a developer I’d much rather be writing Python, but that’s no good if I can’t actually deliver software to users.

12 comments

> Unfortunately high-quality bundling into executables just isn’t a focus of Python (nor of any other high-level language).

You seem to be ignoring Tcl, which solved this problem about 20 years ago with Starkits (https://wiki.tcl-lang.org/page/Starkit) and Starpacks (https://wiki.tcl-lang.org/page/Starpack).

> Unfortunately high-quality bundling into executables just isn’t a focus of Python (nor of any other high-level language)

Wouldn't Dart's [1] and Go's executable support qualify as such (high levels languages and good executable support)?

For example, Sass is also using now Dart [2].

[1] https://dart.dev/tools/dart-compile

[2] https://sass-lang.com/dart-sass

Let me know once dart or go can run a wide ecosystem of libraries and middleware that apply to almost any situation. There's a reason people still use languages like C++, Python, and Java despite them being objectively bad in some respects. Without mainstream adoption you've got a chicken and the egg problem.
Sure, here's your reminder that Go supports a wide ecosystem of libraries and middleware that apply to almost any situation.
I think you're missing my point here. Let's say you're given a pytorch model. You can either run it with native python or grab the golang bindings that are used by 5 people, and if you get stuck there, god help you. You'll be hitting obscure bugs in no time.

Even if something officially supports multiple languages, the obscure ones will be usually get inferiorly implemented bindings just because they're not used as much and will consequently not be as tested and as patched as the main ones. It's unfortunate but that's life I guess.

> I think you're missing my point here. Let's say you're given a PyTorch model. You can either run it with native python or grab the golang bindings that are used by 5 people, and if you get stuck there, god help you. You'll be hitting obscure bugs in no time.

To use a Python-specific framework you're better off using Python. Well duh.

Pytorch, despite the name, is really a C++ library with well developed Python bindings. There's certainly nothing in principle about it that's Python specific.
That’s one thing I like about Nim. It can compile to C++ and so can directly wrap C++ code. While Nim’s ecosystem is small it’s easy enough to wrap most any C/C++ library. It’s great for OpenCV. Some folks have been updating direct PyTorch C++ api (1)!

1: https://github.com/SciNim/flambeau

These languages face the same challenges as python do once you look beyond compiling pure-<language> to executables. This common misconception is beyond frustrating to constantly have to deal with in these arguments.
I haven't used Dart, but I have 15 years of experience with Python and 10 years with Go. Go absolutely solves many of Python's problems including performance, single-native-binary compilation, dependency management, a lackluster-at-best static typing system, and many others. If you're an experienced Python developer, then you're used to announcements that promise a lot and utterly fail to deliver (consider all of the different package management "solutions", alternative runtimes, etc), but Go really does what it says on the tin.
Could you provide an example of what you mean? It seems obvious that using a non-Go library with Go would be more complicated, but is that not also the case with C or C++? Do they have some special way of using a Go or Python library that Go does not reciprocate?
That's exactly the point. The common misconception that I am frustrated by, is that people compare their experience working on a pure-go project with pure-go dependencies to their experiences working on pure-python projects with non-python (e.g. c , or rust, or anything else really) dependencies.

So to illustrate: if you were to work on a pure-python project with pure-python dependencies, existing tooling (such as pyinstaller, or nuitka, or others) can provide single binary executables, just as easily as you can in go.

pyinstaller is kinda dumb here because it doesn't cache the extracted files; it extracts all of them (which is especially slow on Windows), runs the application from that temporary directory, then deletes it.

At work I've built a simple cross-platform packaging system which avoids issues like this and also separates library components from the main application (the former is rarely changed, the latter is frequently changed). Despite the entire application being around ~25 MB, it starts in around 30-50 ms from a network share on Linux and 1-2 seconds on Windows. On Linux it's just a bash script eventually calling a system-provided python 3 interpreter, on Windows there's a C# .exe which does roughly the same but the package also includes a Python interpreter (which is just the official "Python for embedding into applications" ZIP) which the C# process loads via P/Invoke. The entire thing is less than 1000 lines of Python/bash/C# and works with every package under the sun and apart from the C# stubs, which basically never change, it's supremely easy to "cross compile" because compilation in this system is combining the wheels of all dependencies into a single zip, which also makes builds completely reproducible.

then deletes it

Is that built into python or does the developer call that cleanup? I ask because I recently switched from youtube-dl to yt-dlp on a windows machine and it left multiple temporary directories full of .py files to be cleaned up by other tools. I did cntrl+c out of it at least once. Perhaps I interrupted the cleanup routine? Is the cleanup routine expected to trap kill signals or is that on the developer?

The cleanup code is part of the stub/bootloader, so depending on how the process dies it doesn't run.
- Does not work for native dependencies.

- Does not allow splitting the application up into stable/unstable parts to reduce update sizes.

- Does not directly allow multiple entry points.

- Requires Python to be already installed, or requires bundling Python loosely, which is very slow when used from a network share.

- Uses zipimport under the hood, which is very slow on network shares, and very very very slow on Windows network shares.

You also need to make sure to compile the binaries in the OS version that is close to your target deployment version, I remember had RHEL 6.x Vs RHEL 7 issues due to libc or a similar dependency.
Yes this. Differing glibc versions caused us headaches.
I thought that it is the case only when you bundle into "one file"? If you don't pass the --onefile parameter, all files are in one folder without any archiving, am I wrong?
That’s right - but if you don’t use `-—onefile` you don’t really have a “stand-alone executable” at all.
Of course you have a stand-alone executable. At least if by "stand-alone" you mean a copy you can ship to users. Which is at least the only thing I care about. And tell me which popular (say) Windows software actually comes as a single .exe. They're all many files in multiple folders. And there's no problem with that. You have one installer, which users download, which then extracts the files. Your criticism points out a non-issue.
Correct. I don't use --onefile at all. I go the old-skool way, and package my python Windows app(s) into an installer using Inno Setup.
I've not tried it myself, but heard people having good experience with using nuikita (https://nuitka.net/) for the purpose of building standalone python executables.
I've used it a couple of times. When it works, it's great, but there can be issues with bundling dependencies correctly. As such, I've resorted to using it just with the standard library, or a few 3rd party-modules that are nuitka-friendly.
I love Go for quick and dirty CLI apps. They are a bit bigger than the C app but so much friendlier to develop on. Rust is better for more complex processing, but it is also less straightforward (at least for me).
I say this as a Rust fan and a (sometimes overzealous) promoter; it's just easier to wrap your head around programming in Go. They have made it so easy and enjoyable. It really is a pleasure for quick concepts to table, so to speak.
While not entirely self-contained, `pex` is pretty close. It relies on a Python interpreter in the environment somewhere, but in my experience it successfully packages a venv + code into a single executable "binary" with no extraction stage.
I fully agree. As a close runner up on C++ for bundling into executables is C# in my opinion.
How well does C# work for this cross-platform? Are its executables very large?

I considered using Go for command-line apps - but in my case they often have to interact with existing C++ code. That’s easy enough to do from Python, but I was advised that interop with other languages is an area in which Go is particularly weak.

I think a standalone "Hello World", built with .NET Core, is around 60-70 MB large on both Linux and Windows. The startup times are alright - slower on Windows than on Linux in my experience because of the file system performance.
Lower than this. If I use the AOT compile option with .NET Core 3.1 I get a 40MB binary which goes down to 10MB with the trim option.

Just like Go which embed it’s runtime, .NET Core binary compiled with AOT embed the CLR but, unlike Go, it has options to trim out the pieces that aren’t necessary, which I really like.

I did not know about PublishTrimmed and will try it out. Thank you.
I think this is one of those features that was a bit preview in 5 and you should use the just-released v6 for PublishTrimmed. There are also situations where it doesn't work out of the box (e.g. Winui3!)
> I think a standalone "Hello World", built with .NET Core, is around 60-70 MB large on both Linux and Windows.

Lol what?

What goes into a 70 MB implementation of 'hello world'?

It’s not a 70 MB implementation of Hello world. It’s a 70 MB compressed runtime with a standard library (or at least the most basic parts of it) and a couple KB "Hello world".
Why is the runtime so huge and why is so much of the standard library included?

Java ‘hello world’ compiled into a static binary is just a couple of MB.

If you use legacy .NET (which is preinstalled on Windows) a GUI Hello World is about 10 kB.
gRPC is one way to go about it if you have control over both c++ and go. Go binaries work pretty consistently across Linux which is a positive.
I would have thought most command-line users would be content to install with pip.
yes, same for cxfreeze etc. It really comes with a cost. If you have mid ranged applications which might spawn several processes of the executable this becomes really problematic (cpu usage, memory usage).