Hacker News new | ask | show | jobs
by avidiax 661 days ago
It is sometimes used to allow one binary to be the symlink target of hundreds of commands.

Android does this for most common shell commands. Toybox and busybox are examples of such implementations.

https://github.com/landley/toybox

https://en.m.wikipedia.org/wiki/BusyBox

4 comments

I just learned that rustup/rustc/cargo etc. work like this too. I couldn't understand why the gentoo formula was symlinking the same binary to a bunch of aliases.
On my system, these are hardlinks (regular files with a link count >1 and the same inode) rather than symlinks, though I'm not sure why.
Maybe to avoid broken links if you move the original files? That's the main benefit of hardlinks vs symlinks in my mind at least.
That can also be a downside, you believe you have moved stuff but now you can have different versions of programs that don't expect that to be a possibility.
If there is a simlink, a hardlink and an executable, all with the same name, which one will it run? Which one will the shell object to? Which one should the shell object to. If a virus/SUID program overwrites a simlink, no problem, but ift it traces the simlink to the executable, and then over writes that...
And that makes a lot of sense, especially for binaries that are statically linked (as usually are Rust binaries), since that could save a lot of disk space!
clang does this too.
Also if you want a program to call itself, which is sometimes useful, this way lets you actually call the same program, rather than assuming the name and path.
Don't do this - if you (reliably) want the path to the current executable there is no portable way to do it, but on Linux you need to readlink /proc/self/exe and on MacOS you call _NSGetExecutablePath. I forget the API on Windows.
I would not say it in such absolute way - /proc/self/exe has downsides as well. As this resolves all symlinks, so this breaks all the things that depend on argv[0], like nice help messages, python's virtualenv, name-based dispatch, and seeing if the program which was executed via symlink or not.

A lot of times you know you never called chdir(), in which case I'd actually recommend executing argv[0], as this is nicest thing for admins. If you are really worried, you can use /proc/self/exe for progname and pass argv[0] as-is, but that's overkill a lot of times.

Those are all cases where you're using argv[0] as an argument to the program where it's appropriate. Using it as the path to spawn a child process is incorrect. You're free to re-use it as an argument.

I have fixed enough software that made this mistake that I'm confident to be absolute about it. It's a very easy mistake to make but it's really annoying when software makes it and someone needs to deal with it at a higher level. It's better for developers to know that argv[0] isn't the path to the executable it's what was used to invoke the executable.

What’s the issue with using argv[0] as a way to spawn yourself? I don’t recall running into a lot of issues.
If it's a relative path, then changing the working directory will break (chdir("/") is a very common tactic at the top of main()).

It's possible/desirable for the parent to change the PATH of a child process, particularly one that spawns other processes. So the argv[0] used to spawn the original process may be garbage for spawning children.

Similarly in any kind of chroot jail (which may or may not be docker these days), relative paths and PATH can be garbage even if they don't change.

The real problem is that I've seen in-house and open source frameworks/libraries that have a function like `get_executable_path` that reads `argv[0]` and this is just incorrect behavior. Spawning yourself is one of the less risky things you can do, but there are gotchas and a way to avoid them!

I think you forget the exec system call’s first argument is a path to an executable, followed by an array of arguments, where arg[0] lives.

I can’t find issue with exec(“/proc/self/exe”, [ program , … ).

Well, it could be for example that /proc is not mounted. A lot of software breaks for this, while really there is no need for it to be so. Also that approach only works on Linux, if you want to write a portable software what you do?
I am mainly pointing out that arg[0] is still valid. Writing portable software is an entirely different topic.
Note though that both of these solutions are racy and so should not be done if "someone symlinking really fast and swapping the binaries" is in your threat model. Linux proc/self is safe though, just not the result from readlink.
Well that's true, but also something that can't be addressed within a currently running process afaik.
There's also this very handy and tiny cross-platform library:

https://github.com/gpakosz/whereami

Four cardinal sins of programming: 1. Self modifying code. ( The word 'recalcitrant' comes to mind. 2. calling your own program to execute itself. 3. Interrupting the flow of control with a jump. 4. Non-graceful exit. 5. Renaming 'hack' as 'vi' or 'ps'
There's no guarantee that the name and the path are still the same executable that is running, or that they even exist anymore.
In most of the variants of exec*() there are separate arguments for the thing to be executed and the *argv[] list. Argv[0] being the executable is just a convention. In perl $ARGV[0] is the first positional parameter. In

    $ perl myscript.pl a b c
$ARGV[0] is "a".
I mean sure. All software is built on assumptions. Make sure the assumptions you’re making are appropriate in context.
Unless you are on Windows
You can actually rename an executable that is running, on Windows. That's a way to handle self updates: rename the executable, create its replacement, execute the new one to make it remove the old executable.
Beware TOC TOU problems when doing this.
You can do this without assuming the name by execing /proc/$PID/exe. Then you're not vulnerable to the argv[0] spoofing described in the article. (But of course since argv[0] does exist, you should set it properly and pass through your own argv[0] unchanged.)
That's not portable, though. OpenBSD, for example, doesn't have /proc.
That’s Linux only. Wouldn’t even work on macOS, which would likely be a significant number of your users.
coreutils-static did this too. The advantage of shared libraries and multiple-use single static binaries is they're only loaded once.
The article discusses this.