Hacker News new | ask | show | jobs
by yjftsjthsd-h 661 days ago
So obviously claiming that there's no good reason for process to read argv[0] is either demonstrating the author's ignorance or needs a much stronger defense; I'd be fascinated to hear how they think busybox should work on an OpenWrt box with a 16MB root filesystem.

However, I am willing to consider the discussion about whether there could be merit to restricting the ability to write that value; I could imagine a system that populated it only from the actual file name and did not allow it to be written by the parent process or the child process at runtime. The obvious place this still falls apart is that an attacker could just

    ln /bin/curl ./some\ other\ name
but there are sometimes security measures that we use even though they're less than 100% effective so it at least conceivable that this might be a trade off worth making.
13 comments

I agree, I think the author really shot themselves in the foot when they, at length, criticized the merits of a program using argv[0].

The real point are the security flaws in a calling program setting argv[0], because it really, really should be set by the operating system. (As a programmer, I shouldn't have to defend against these kinds of attacks. The OS should block it.)

The criticisms of valid programming practices, IMO, hurt the author's credibility and distract from the real point of the article.

The real security flaw is extracting a value from a process's own memory to identify what the process is. If you want a secure way to identify what a process is and where it came from, that needs to be a new feature in the OS.

argv[0] was designed to be part of the arguments to the program, and it succeeds perfectly at that task. The problem is that it has been abused by external tools as a way to identify the program just because there was no other alternative.

It has to be writable because the entire argv string (in program memory) is writable and declared as

  int main(int argc, char **argv)
not

  int main(int argc, const char **argv)
and needs to preserve back-compat. Classic C code might be calling strtok on the arguments, so that block of memory needs to remain writable.
> The real security flaw is extracting a value from a process's own memory to identify what the process is. If you want a secure way to identify what a process is and where it came from, that needs to be a new feature in the OS.

How would that help? After all, even if this info comes from the OS, the decision logic still lives in your process's memory which the parent process still has full access to.

Take a closer look at the exploits listed, they all have to do with malware manipulating argv[0] when creating a new process; not with a process manipulating argv after it starts.

There is no mention of mutable memory attacks.

(If I was on a computer I'd fire up a C IDE to even see what happens when I mutate argv. I suspect the OS keeps its own copy of what the process was started with.)

It's not about mutable memory attacks, it's about not understanding the purpose of argv[0]. argv[0] is an argument, you are supposed to be able to set it to whatever you want. You are not supposed to rely on an argument to identify a program, that is nonsensical.

The problem here isn't argv[0], the problem is security software not understanding what argv[0] is and if you want security software to better be able to identify processes, the solution isnt changing argv[0], it's implementing an actual process ID checking.

> it has been abused by external tools as a way to identify the program just because there was no other alternative.

There is an alternative, at least on linux: /proc/$pid/exe.

And if your question is "what executable is running" that is a better way to get it. But for a program like busybox, argv[0] is also important.

Pretty much any OS lets you examine which binaries have been mapped into the process adddress space meaning there are plenty alternatives.
It's simply a case of "Do use argv[0] for this, and don't use it for THAT.

Both Windows and Linux provide APIs to get the actual path of the executable. Posix, to the best of my knowledge, does not. Regrettably. And the Linux API is, admittedly, a bit weird. But not that difficult really. Nothing that you can't get Claude to spit out for you in under 45 seconds. ;-P

The contents of argv[0] are yours to use and abuse as you see fit. Operating systems don't know or care if you go trampling recklessly through the contents of argv[0].

And the authors contention that "power --shutdown" and "power --reboot" are viable alternatives to "shutdown" and "reboot" seems.... disingenuous. Is the politest word I can come up with.

And, if you haven't asked yourself, "wait a second, what happens if somebody passes me garbage via execve" before you are halfway through writing the substantial amount of code required to portably normalize argv[0] to an executable path, I don't think you can be trusted to write secure code of any form. Just normalizing the various forms of argv[0] that a Linux shell passes you is a non-trivial effort. So don't use it for THAT.

I see a common anti-pattern in security researchers in that they can lose sight of the human beings who operate the software.

argv[0] should be used by any logging message that purports to report the program name, because argv[0] should be a string the human recognizes as something they invoked. Taking it away would break usability.

This does, of course, imply that the program name is non-constant untrusted data. Which means we shouldn't be making security software that depends on knowing that name.

That seems unnecessarily harsh.

I don't think that's the gist of the article, but the throwaway suggestion of 'just make lots of copies, who cares about diskspace' is insufficient and thus distracts. It's.. a single line about solutions in an article that isn't _about_ solving problems, it's about highlighting a problem exists and that it's worth solving.

I read the article more as: There is __often__ no good reason to use argv[0], and it should be avoided if at all possible, and if it cannot be avoided, it would behoove the industry to work on ways to make sure in the future it can be avoidable.

For example, why in the blazes does windows taskman.exe list argv[0] in the GUI table view? That's just asking for trouble. Show the actual file path, and always an absolute one - that way you avoid confusion about which executable you're actually running, and it's just as readable if not more readable for every app _except_ those who care about argv[0], e.g. if you ran `/bin/dd` and it's actually busybox, in taskman you'd see `/bin/busybox` instead which'd be worse than seeing 'dd'. That is simple enough to solve (add an API call to update _your own process name_ or at least update your own process 'title' which interfaces like ps/taskman can use accordingly), but, now we're talking about coordinating between OS, glibc, busybox, and so on - lots of parties. I don't mind that the article doesn't delve that deep, as that wasn't the point of it. The point is simply to show the problems the kludge of 'we will show argv[0] instead of the executable name' causes.

This article feels more about explaining that in the distant past, a mistake was made with some history as to why that mistake was made and the deleterious practical effects that this mistake is causing or is likely to cause (most of them security related). It's not really about solving the problem; that presumably comes later and should be sketched out by those who are knowledgable on _that_ subject. That doesn't imply the author is ignorant or that the article is insufficiently defended. Just that it hasn't covered all aspects of what it's writing about.

> Show the actual file path, and always an absolute one - that way you avoid confusion about which executable you're actually running, and it's just as readable if not more readable for every app _except_ those who care about argv[0], e.g. if you ran `/bin/dd` and it's actually busybox, in taskman you'd see `/bin/busybox` instead which'd be worse than seeing 'dd'.

This was kind of in the middle of your complaint about windows, but then you've got unixy busybox discussion.

On a unix filesystem, a file that's hard linked with multiple names has no single 'actual name'. All of the names are equally valid. You could show the filesystem and inode number, which should uniquely identify the file, but is pretty user unfriendly.

> On a unix filesystem, a file that's hard linked with multiple names has no single 'actual name'.

But each of the multiple names points to the same actual data, so it doesn't matter which one is shown. The obvious choice would be to show the absolute path that the OS used to load the executable.

> On a unix filesystem, a file that's hard linked with multiple names has no single 'actual name'

The same is true for hard linked files on Windows. That never stops Windows from showing you a path.

There is almost always an "obviously right" path (the one used when opening the file). And if you lost track of that, deterministically choosing one of the possible paths is almost always more user friendly than just chowing inode numbers.

> There is almost always an "obviously right" path (the one used when opening the file).

The path used while opening a file is easy to get confused. If your cwd changed names or was deleted since you entered it, and you open an executable with a relative path, what is the "obviously right" path then?

In context, there's still a right answer; the absolute path used to run the process should always be right, because the Windows locks the file (I'm sure with enough effort this can be made wrong, but I'm also sure that's not trivial).

Really, this discussion just shows the question is the problem; if you're asking "what path was used the launch the process", that's easy to keep track of & always be right. If you're asking "what is the path right now to the file that launched the process", maybe that has no answer.

I don't think that really follows. If the OS wants to track this, it should canonicalize the path to the executable on startup, and then stash it somewhere.

(And a program could do that itself, if it wants to.)

> highlighting a problem exists

Coding bugs into your programs is not a problem it’s a bug. None of the weird arg[0] examples can happen on the shell (without escaping), only when using system calls.

The more I read the article the more I feel this is a reaction to a behavior the author did not expect and fancy them as smart therefore the last 20 years of use age of this feature are obviously wrong.

> None of the weird arg[0] examples can happen on the shell (without escaping), only when using system calls.

  $ help exec
  [...]
  Options:
    -a name pass NAME as the zeroth argument to COMMAND
Even in shell, you can explicitly specify the argv[0] when running an executable.
Not in all shells, but in some exec is a pass through to the system call …

Bash is a language, so again we are telling the stream to do something silly and calling it out had a security problem.

The issue is not arg[0] but uninformed expectations on how these systems work.

Relying on the program/command name for security and not the executable path is a bug.

Furthermore if a bad actor has enough access to run exec you probably are in a bad way.

The whole post also seems to not understand that both windows and linux have ways to change this display after the executable is running via SetConseTitle and prctl or simply modify arg[0] directly.

There's the `setproctitle` in FreeBSD that is designed exactly for a process to update the information that is presented to tools such as ps.

https://man.freebsd.org/cgi/man.cgi?query=setproctitle&aprop...

There's also getprogname(3) on a lot of systems, and the __progname variable. I seem to recall this is an area where various Unix like systems have slight variations.
prctl(PR_SET_NAME) on Linux - sets the thread name, and the name of the main thread is shown as the process name in most tools.
> it's about highlighting a problem exists and that it's worth solving.

If so, then I disagree with the premise of the article, fundamentally. I don't see a problem. If someone is writing security software and doesn't already know about the mutability of argv[0], and doesn't know that (on Linux at least) /proc/$PID/exe is the only correct way to gt the binary backing a process... well, then they have no business writing security software.

There is no problem here. The author is making a big deal about nothing, either because they have a weird axe to grind, or because they're ignorant.

>Show the actual file path, and always an absolute one

There are numerous reasons why this is not desirable, for example knowing whether an application was called from one symbolic link or a relative path dictates what that application's working directory is.

It's easy to call something a mistake in hindsight.

You could argue the mistake was done elsewhere so this feature could be abused.

“That is simple enough to solve (add an API call to update _your own process name_ or at least update your own process 'title' which interfaces like ps/taskman can use accordingly)“

We could call it setproctitle, or something. \s

Not an author, but there's a good alternative. If busybox was edited to ignore argv[2], then applets could be called via shebangs, instead of symlinks:

    $ echo '#!/path/to/busybox echo' > myecho
    $ chmod +x myecho
    $ ./myecho 123
    ./myecho 123
Right now this doesn't work properly, because "./myecho" (argv[0]) gets placed into argv[2] of the process. Otherwise, this technique IMHO is better than symlinks:

- Each applet uses the same amount of disk space (0 blocks, i.e. the content fits into inode).

- Doesn't read or write to argv[0].

- You could finally rename the applets. This is not that useful if busybox is your only posix userspace implementation, but very useful if you want many implementations to live side-by-side. E.g. on macOS, I'd like to have readlink point to BSD/macOS's readlink, greadlink to GNU coreutil's, bbreadlink to busybox's.

But as I said, this doesn't work for now. The best you can do now is to write shell two-liners https://news.ycombinator.com/item?id=41436012. Some of such two-liners may also fit into the inode inlining limit, so that's a plus. But you will have performance penalty on every call (since sh needs to start up).

> Each applet uses the same amount of disk space (0 blocks, i.e. the content fits into inode).

Is that really the case? AFAIK, OpenWRT uses SquashFS by default, and a quick web search tells me that "[...] In addition, inode and directory data are highly compacted, and packed on byte boundaries. Each compressed inode is on average 8 bytes in length [...]" (https://www.kernel.org/doc/html/latest/filesystems/squashfs....). That is, even if the content fits into the inode, it will make the inode use more space (they're variable-size, unlike on traditional filesystems with fixed-size inodes).

And using hardlinks (traditionally, we use hardlinks with busybox, not symlinks) goes even further: all commands use a single inode, the only extra space needed is for the directory entry (which you need anyway).

Well that would be inefficient. For each command you run the kernel has to read the file, detect that it has a shebang, parse the shebang line, and then finally load the actual executable in memory. That could be a performance problem, since busybox is used typically in embedded systems that doesn't have a lot of resources: imagine a shell script that runs a command in a loop, it has to do a lot of extra work.

Finally, symlinks can be relative, while the solution you proposed is not. This is particularly useful for distributing software, e.g. distributing a tar file with the busybox itself and their symlinks.

In fact, you don't even need symlinks at all: you can even have hard links, that could even save disk space on embedded filesystems, that are readonly images anyway.

> Well that would be inefficient. For each command you run the kernel has to read the file, detect that it has a shebang, parse the shebang line, and then finally load the actual executable in memory.

Those that exist today would, but no kernel would have to work like that.

Once you've agreed that monolithic kernels have merits, you've accepted that the kernel can do whatever it wants to make this efficient—including being complicit in this scheme and leapfrogging over most of what you just described.

> Those that exist today would, but no kernel would have to work like that.

That's a pretty weird argument. "Yes, what you say is completely correct, but let's imagine a world where you were wrong."

We have what we have, today. We should form conclusions and make decisions based on things that exist, not on things that we might dream up.

I’m going to challenge you on the performance angle. Instead of doing the shebang line, it has to traverse the filesystem to resolve the link. I suspect that’s probably more expensive than parsing the shebang line. Indeed, a shell script that runs a command in a loop should have busybox detecting the built in command & executing it inline without spawning executables via the file system (this is common in bash as well btw).

There are valid reasons but I think the performance angle is the weakest argument to make.

> Instead of doing the shebang line, it has to traverse the filesystem to resolve the link. I suspect that’s probably more expensive than parsing the shebang line.

I highly doubt that. Path traversal is one of the most optimized pieces of code in the Linux kernel, especially for commonly accessed places like /bin where everything is most likely already in the dentry cache. For the script with a shebang on the other hand it first has to read it from disk (or the page cache), then parse the path from it, and then do a path traversal anyway to find the referenced file.

Imagine the performance problems of running 'shutdown' and 'reboot' in a tight loop!

Besides, should one really write something performance critical for embedded in shell in the first place?

I was going to say it'd be easier to have a single script, eg

    #!/bin/sh
    busybox $0 $@
and then every command required could just be a hardlink to the same script, instead of replicating it over and over again for hardcoded command names.

Then I realised the whole point is to posit a world where $0 doesn't exist, and we're not allowed to be clever about it.

In such a world, shells would probably have something like a $SCRIPT_NAME to work around this.
Are shebangs recursive? Otherwise this means that busybox can no longer provide /bin/sh.
Another example program that reads argv[0] is Rustup, the version manager for Rust. Rust versions can be set per directory either as a machine-specific override or via a file. Rustup is symlinked to all the Rust commands like rustc and cargo, and when invoked as one of those commands, it checks what version it is supposed to be using, and then forwards to that version. I don't see how you'd do this without argv[0] (or a dozen slightly different pointlessly recompiled binaries).
For busybox/toybox the argv[0] thing is great, and seems to be the prime example of why argv[0] shouldn't go - yet it is a bit of an anomaly in how argv[0] is used.

If there really is a need for having one executable that comprises multiple commands, is `busybox whoami` instead of `whoami` so much more effort? To me, that would make more sense in terms of what is going on; aliases could be used if one-word commands are preferred. In most non busybox contexts, argv[0] is just an unnecessary addition that, as the linked article shows, can introduce weirdness.

It's clear from the comments there are still many who think argv[0] is a good thing, which is great - I'm glad the post sparked this debate.

> is `busybox whoami` instead of `whoami` so much more effort?

It's not the "more effort" that is the deal breaker here. It is a matter of compliance with specs and user expectations. What you're suggesting would make Busybox very non-POSIXy, very non-Unixy. All scripts written over the last many decades would need to be updated to call `busybox ls` instead of `ls`? How is that a viable solution?

> I'm glad the post sparked this debate.

This is a very strange way to deflect concerns about quality of the article!

Yeah. The whole point of busybox is to provide the POSIX commands in one compact executable. Making things work any other way defeats the entire purpose of busybox.
In other words: `busybox` is primarily an implementation of a _standard library_ and only secondarily a command line tool, so it _must_ use the standard names.
Given that 'alias' is in POSIX, would a combination of

(a) the hypothetical non-argv0 busybox being discussed, and

(b) a POSIX shell of the maintainer's choice, with built-in aliases for 'ls=busybox ls'

be sufficient to make the system POSIX complaint?

aliases are not inherited by subprocesses unfortunately! so the alias solution would not work when a shell script launches other shell scripts. It wouldn't work in a wide range of other scenarios too like Makefiles, bespoke build tools, binary executables that do execve("/usr/bin/cmp", ...) etc.
In addition to the already-raised issue of subprocesses not inheriting aliases, I'd also be worried about aliases inherently being specific to particular shells. I'd hate to have to redefine those aliases for sh, csh, zsh, fish, and Lord knows what else. It'd also be an issue for invoking those tools without going through a shell in the first place - as is common for programs launching external programs as subprocesses.

That's indeed why I personally don't use shell aliases at all, instead opting for actual shell scripts in my $PATH. Those will work no matter what shell I'm using (if any).

`busybox whoami` is probably fine, but having to write `busybox ls`, `busybox grep`, `busybox cp` etc. would get tedious quickly.

Shell aliases don't solve all problems, even if you do:

    alias rm="busybox rm"
    alias xargs="busybox xargs"
    # etc.
you still have to write `xargs -exec busybox rm`, because xargs won't use the shell alias.

But the main problem with this approach is that POSIX and LSB require certain binaries to be available at certain paths. When they're not, most shell scripts will just break.

The minimal standard solution is probably to create shell scripts for all of these, e.g. in /bin/ls:

    #!/bin/sh
    exec /bin/busybox ls
But this both adds runtime overhead (on every invocation!) and is quite wasteful in terms of disk space. Busybox boasts over 400 tools. At 4 KB per file, that's 1.6 MiB of just shell scripts. Of course that can be less if the file system uses some type of compression which is common on embedded systems where storage space is small, but it still seems to defeat the purpose of using busybox to create a minimal system.
Well /bin/sh is also busybox, so I think you'd need

    #!/bin/busybox sh
    exec /bin/busybox ls

?
Great point!

Actually this observation invalidates the whole setup. Because even though you could define /bin/sh itself as:

    #!/bin/busybox sh
    exec /bin/busybox sh
Then you still cannot use #!/bin/sh in any other shell scripts, because for historical reasons the interpreter of a script is not allowed to be another interpreted script, it must be a binary. So /bin/sh pretty much has to be an actual binary.
Yes. Anybody who has shipped software would say.

I really don’t think it is a debate. The usage of arg[0] is massively understated by the article. Just go look at gcc or any modern day compiler. Its use so much that the conversion of should we has been hashes out by many different groups yet they still chose to implement it.

The security concerns are a non issue. As arg[0] was not the problem. It was the lack of technical knowledge of how systems work and a flaw in the security application.

I think you’re both forgetting that bash has been using this trick for decades.

Bash has an sh compatibility mode that runs when you invoke it as sh.

Well of course it's not only a matter of interactive usage (even because the busybox itself shell could do the conversion). The problem are script, or worse programs that invokes commands as subprocesses (programs that maybe you don't have have access to the source code!).

What you do? Replace every single occurrence of each command by prefixing it `busybox`? Not ideal at all...

https://pubs.opengroup.org/onlinepubs/9699919799/

You appear not to realize that busybox is an essential component of a POSIX like system.

That's fine for when users are interactively typing commands, but it doesn't work when the command is being run by a non-busybox program which expects commands to exist in the standard locations.
Restricting setting it would break login.

Not that it couldn't be fixed by changing how we handle login shells but still. Worth remembering.

Similarly the busybox situation could be solved by having busybox ship posix shell wrapper scripts which use `#!/bin/busybox sh` as the shebang and simply consist of a line like `exec /bin/busybox ls "$@"`.

It can already do that, afaik. When I last checked, BusyBox supported installations via 4 methods:

  - symlink
  - hardlink
  - shell script wrappers
  - executable binary wrappers around libbusybox
Nice!
You are over-complicating it, you only need `#!/bin/busybox ls` as the entire contents of the file.
This doesn't work. The eventual command would be /bin/busybox ls /path/to/ls/wrapper ...
If hard linking (no symbolic) is used to install the BusyBox commands, then instead of argv[0], BusyBox could use the platform-specific means of obtaining the executable name, and take the basename of that path. On Linux this means /proc/self/exe; _NSGetExecutablePath on Drawin; getexecname on Solaris; GetModuleFilename on Windows; ...
https://github.com/util-linux/util-linux/blob/master/login-u...

edit: basically login(1) execes your shell with - prepended, so an example where POSIX expects this

Is there a good reason for allowing writes to argv at all?

I think any reason one will find are based on backwards compatibility.

Yeah, never mind shutdown/reboot, has the author heard of busybox?
Isn't the reason for busybox multi-call binary mostly just ELF being bloated? So the answer for resource constrained systems would be to have more efficient executable format. I don't see why multi-call binary + bunch of symlinks would be intrisically much more size-efficient than something purpose-built.
a lot of code is shared between different tools. Busybox has one copy of those. Before you mention shared libraries. There is still overhead, as well as complicating the usage (it needs to find a shared lib when starting instead of just having all it needs in the binary). This isn't really a property of the executable format. Any format would have the same problem.
You can write a 2 liner shell script that prepends busybox per command. I've done this on a 16MiB restricted system and while ate maybe 4k per command, it wasn't a big deal with only 20-30 commands.
What about compiled binaries that for one reason or another is doing an execve() on "/usr/bin/cmp" or some such thing? Do you propose changing every script and every binary on earth that expects Busybox to be a POSIXy, Unixy environment?
On Unixes it doesn't matter if /usr/bin/cmp is a script or a compiled binary. If the script has correct shebang, kernel takes care of executing it.
Shebangs are not part of the UNIX specification. What happens if an executable starts with `#!' is implementation-defined.
UNIX/POSIX doesn't specify an executable format at all
is being able to run busybox part of that specification? Linux ELF files are tagged for Linux OS, not generic Unix OS.
> is being able to run busybox part of that specification?

It's the other way round. This is like asking if running RHEL is part of the specification? Obviously not. But RHEL provides an environment that is quite close to the specification.

So is running a busybox part of the specification? Obviously not. But Busybox provides an environment that is close to the speficiation.

No you make this script:

   #!/bin/sh
   exec /bin/busybox cmp
And place it at /usr/bin/cmp
Surely you mean

  #!/bin/sh
  exec /bin/busybox cmp "$@"
`#!/bin/sh' makes this less portable than it could be, if /bin/sh doesn't exists on my system it won't work, for example. Remove that line and it'll work everywhere.
If /bin/sh does not exist, what in the world is executing the shell script?
So now to execute a program that could have been a direct run you have to fire up a shell, have it parse the file, and execute the instruction? Not really a great thing...

Plus, you have to know the absolute path of the executable busybox, not something you always know in advance.

You're not wrong about the extra execution of shell, although a lightweight shell like ash or dash doesn't have a huge overhead.

But realistically, the only people that need to worry about the actual location of the busybox executable are the people who write the install script - it would take that as an argument or variable and spit out all the little scripts as an automated process. The current installer already has to do most of this work as part of setting up the relevant links anyway.

As the original author says (but seems to forget within a paragraph or two), the program should already know what program it is. If you're looking at argv to find out what program you are, you are doing it deeply wrong. It's an argument.

One good use for it is to make a guess as to where your executable is installed. Yes, it would be nice if there were a more certain way to get that... but not for security purposes. You don't want to rely on filenames for security anyway, because anybody can make copies and symlinks and rename files at will, and it's really, really hard to catch all the cases of that. Much harder than, for instance, remembering that argv[0] is a hint from your caller, not gospel from the OS.

In the same way, I know that it's fashionable nowadays for incompetent idiots to write security tools, but a security tool that trusts an argv value for anything much was obviously written by an incompetent idiot, because that's not what they're for.

Am I missing something, you didn't seem to address the case where you actually need to know which program you are? The way busybox provides the whole suite of linux-utils in one binary and require the command under which it was invoked to know what to do.
Busybox still knows that it's busybox, and it is using that argument to decide which of its many functions to execute.

This person is arguing that that's somehow wrong because busybox, or more importantly some other software that's trying to monitor it, might get confused about whether it's busybox.

Busybox is quite well known project, but frankly from the way you write about it, it does not look like you know how it works so apologies if I'm explaining something that you already know.

Busybox is a reimplementation of the standard linux utils (ls, find, dir, etc..) for resource limited machines. To quote from the man page:

> BusyBox is a multi-call binary that combines many common Unix

> utilities into a single executable.

How it works is that it symlinks the binary to each of the commands it implements and then it executes the corresponding functionality based on the value of argv[0].

I know exactly how it works, thanks.

The hangup here seems to be the definition of "program". I'm using it to mean something roughly like "executable", which I think is fairly close to what the original article meant it to mean. You seem to be using some concept of "program" that makes each of busybox's functions a separate program. As far as I'm concerned, on the other hand, busybox is one big program that does a lot of largely unrelated things, choosing which of them to do based on how it's invoked. There's no right answer. You could say that all of the software running on a whole computer is one giant program, and in fact sometimes I do find it convenient to think of it that way.

I don't know that your definition of "program" is wrong, but I do think it's alien to this context.

OK, apologies.

Then your previous statement makes no sense in context. At least to me.

Yes busybox knows it's busybox. But busybox doesn't do anything if it is not invoked in a certain way which relies on argv[0] being what it is today. I am not sure what you're arguing for frankly.