| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by derefr 3618 days ago

To me, for CLI commands, "easier" means two things:

1. more learnable, which usually translates to more discoverable, which in turn translates to more orthogonally-decomposed and regular. You want the grammars, syntaxes, sets of verbs/subcommands, names for taken parameters, etc. of all the binaries on a system to obey the Principle of Least Surprise with respect to one-another, so you can learn the syntax for one binary and then reuse that learning on another binary. (Much more likely to be done in a BSD than in Linux, but still possible.)

2. more wrappable, which usually also translates to being more orthogonally-decomposed and regular. You want your binaries to provide an API not only for humans, but for anything that wants to wrap your command in a GUI, or in a friendlier CLI UI, or in a network service, or whatever else. (If you write your binary as mostly a library with a small binary driver, this is even better—see e.g. curl's libcurl—but this is incredibly rare for some reason, and it's a bit much to demand that people refactor their entire codebases this way.)

Being wrappable means that your binary itself is 100% allowed to be "dumb" and "tedious" for experienced users (like in my tar pipeline-UX example), because aliases or scripts or entire conventional-command packages can be built around your binary.

The fact that there's no e.g. `docker gc` subcommand built into docker(1) is a feature, not a bug; the docker(1) binary just provides a set of orthogonal primitive subcommands. Someone else then wrote a docker-gc package containing a configurable script that calls docker(1) a bunch of times.

There should certainly be a high-level UX that doesn't require three separate commands to create a tar file. But tar(1), the core "engine" handling creating that archive, should not be the binary providing that convenient CISC UX. (Maybe you could follow the git model and allow people to call installed "tar-foo" binaries using `tar foo`, but even this is a bit silly.)

But personally, in my mind, tar(1) itself should be a low-level implementation-detail sort of binary anyway, like mknod(8) or losetup(8). tar(1) only handles .tar files, and that's a silly UX for any user-visible task. The only time you already know you've got a .tar file, or want a .tar file, is in a script—and in a script, you want exactly the kind of verbosity, strictness, and interposed-shell-gloop flexibility I was talking about.

Meanwhile, what a user wants 99% of the time for their TTY-interactive use, is a command equivalent to the things a GUI archive-manager does. Users don't want or need tar(1); they want an xdg-archive(1) that uses libarchive.

> Security?

Not the right layer to enforce that in; that's what the [no_]user_xattr mount option is for. (Or, alternately, hierarchical xattr namespaces, with everything coming from foreign sources tainted into a quarantine namespace.)

> Wait, what? "fail silently"?

rsync(1) actually prints a ton of warnings (one for each file, in fact) when you tell it to copy EAs but it can't manage either the read or write step. But it doesn't fail. Because, like I said, there's no use-case. rsync(1) mostly isn't used TTY-interactively; it's used as a recurring batch job. If you said you wanted data+EAs the first time, and the remote's sysadmin remounts their computer's filesystem so EAs stop being available, you don't suddenly want to stop syncing the data. You just want a syslog full of errors telling you that you missed the EAs, plus smart logic such that you can message the other sysadmin telling them to fix their box, and then rsync(1) will gracefully just add the now-available EAs to the files rather than having to re-copy the files. Which... it does. So, great!

(I mean, I can see a use-case for rsync being told "these files should have EAs" and then performing its sync in two steps, where it copies the files from the remote to a local cache, and then only moves those files into place in the local dir if the EAs are there. This would be important if e.g. your local dir is a public NFS mirror of the remote and people will immediately start downloading and trusting your copies to be full-fidelity copies. But! This isn't rsync(1)'s job. This isn't the negotiated details of mechanism; this is policy—policy that can be, and should be, entirely enforced outside of rsync(1) itself. What you really want to do is to rsync(1) the remote to a local spool dir, and then have an fsevent listener watching that dir, checking the local copies in it for semantic-level validity, and then hard-linking those files into the NFS-export dir if-and-when they become valid.)

> copying ACLs isn't like PASV, it's more like ASCII vs binary mode. trying to auto-detect makes the situation worse.

EAs and ACLs are separate things (one is implemented in terms of the other, but they are separated semantically by almost all programs that handle them, and for good reason.) Allowing rsync(1) to blindly sync ACLs across hosts is a much more complex question than just allowing it to sync userland/"shell" EAs.

My first-pass design would be to just never do it ever, instead just inheriting the local effective ACLs rsync(1) is running under, unless it's clear from nsswitch that the local and remote are part of the same AD/LDAP domain and thus share security objects.

But even then, this gets into the domain of "security considerations for programs when you run them with superuser privileges, allowing them to manipulate security metadata", which is totally separate from the UX considerations of regular userland binaries. SMB has, like, seven interlocking protocols, implemented by smbd(8) through twenty-odd libexec-binaries—some run as root, some setuid, some in userland—required to ensure remote-RPC-exposed objects are seen to have the correct ACLs on all hosts. It's awful and ugly. rsync(1) doesn't need to be that. It's a userland binary, sitting in /usr/bin, for users to sync things and then own those synced copies. It doesn't need to be more than that.

> how about "updating the compression algorithm requires everybody to install a new (de)compressor and until they do they can't open your files"?

It doesn't, though; that's what I was talking about with 'in-file hinting that the previous binary can recognize'. You can create a "wrapper format" that doesn't literally 'wrap' your previous format in a container, but rather embeds itself as an extra block in some conventional place that all the old (de)compressors ignore. In a sense, you're encoding your new metadata using lossless steganography into the old format. This is how e.g. the various versions of ID3 tagging have been implemented: since there are many hardware MP3 players that can't be updated—and MP3 is not an extensible container—each update to the ID3 spec is instead done with a formatting change that those old players will just ignore.

In the case of tar(1), here's one possible implementation of that idea: let's say that all the compressor algorithms tar(1) currently supports have a common notion of (though different instructions for) building a Huffman prefix tree, and of throwing away said Huffman tree. You could just stick instructions to build a special kind of Huffman tree at the beginning of the compressed data, one that is a valid but improbable tree from the compressor's perspective (i.e. something that it would never emit itself, but will accept), and which is represented in the compressed data as a static bytestring (i.e. magic numbers.) Then you can add to libmagic, tar(1) itself, etc. a set of new "compressed tar archive" content-signatures—one for each compressor's instruction-sequence variant, all pointing to the same media-type.