Hacker News new | ask | show | jobs
by js2 661 days ago
> Today however, disk space is no longer considered an issue; this is evidenced by macOS Sonoma, where shutdown and reboot are two separate executables.

Try running `ls -li /usr/bin` on macOS and you might be surprised to learn that all of these are a single executable: DeRez, GetFileInfo, Rez, SetFile, SplitForks, ar, as, asa, ... yacc. There's 77 different entries in `/usr/bin` (including `git` and `python3`) that are all links to the same binary (`com.apple.dt.xcode_select.tool-shim`). It's a wrapper that implements the `xcode-select` concept to locate and run the real executable provided by either the Command Line Tools package or a particular Xcode version you may have installed.

And that's not the only one. There's another 68 links starting with `binhex.pl` and ending with `zipdetails` that are a single 811 byte wrapper-script around perl.

Altogether, I see that there are 26 different names that are multiply linked:

  ls -li /usr/bin |
  awk '{print $1}' | 
  sort | uniq -c | 
  sort -n | grep -v "\s*1\s" | wc -l
Some of the other examples: less & more, bc & dc, atrm & batch, stat & readlink.

Having a program behave dynamically based on argv[0] is a useful tool in the Unix toolbox. The alternative would be compiling 77 different versions of `tool-shim,` creating 68 different versions of that perl wrapper, etc.

The `git` binary uses this concept too. You can create an executable named `git-foo`, put it anywhere in your PATH, and then call it as `git foo`.

In the end, argv[0] is just an argument that can be used to improve CLI ergonomics and reduce code duplication. It's not solely about disk space. I think that makes it a more common and useful concept than you give it credit for.

As to the rest of the post: I'm not really sure how argv[0] being in the caller's control is any different than the rest of the execution context being in the caller's control: the remaining arguments, the environment, limits on file descriptors, which file descriptors are open, the program's real and effective uid and gid, signals it might receive and so on. These all amount to untrusted input any executable has to be cognizant of, more or less so depending upon what privileges the executable has and what its goals are.

1 comments

Besides, disk space is not an issue, but container image size still can be an issue because those have to be copied around the network, and it's easy to have thousands of 10GB images consume more disk space than you might have thought you'd need.
Surely the duplication would be (mostly) compressed away?