Hacker News new | ask | show | jobs
by oconnor663 2122 days ago
I'm going to shamelessly plug my own library here:

https://github.com/oconnor663/duct.rs

I wanted to solve the same problem, originally in Python (https://github.com/oconnor663/duct.py). It's surprisingly annoying to do pipelines and redirections, compared to how easy they are to do in the shell. Lots of libraries try to address this, but most of them seem do it by emulating shell syntax within the host language, using operator overloading or other magic like that. I think that's a limiting choice. (For example, can you use `cd` to change the working dir for the left half of a pipeline but not the right half? In Bash you would use a "subshell" for this.) Instead, I think it's sufficient to build an API out of regular objects with regular methods. The result doesn't look like shell code, but it's easier to reason about, and more consistent across different languages.

2 comments

It can be supported with internal APIs, even without macros: Cmds::from_cmd(Cmd(...).current_dir(..)) .pipe(Cmd(...).current_dir(...)) .run_cmd(...)

As you can see, it is very verbose and that's why I choose to hide the lower APIs at this moment.

I like your approach more than duct.rs :)
‘cd’ is a shell builtin so you couldn’t use ‘cd’ in any of these solutions unless they then spawn a shell instance...and that worries me if you are because then you really might as well just have a separate .sh file and launch that instead (at least that is more auditable with tools like Shellcheck than any inlined code would be).
As I read the parent comment, the broad context is turning "shell-like behavior" into rust code, and the comment is choosing to talk about that projection by focusing on elements in the source and assuming that it's understood that they're really talking about the resulting image. You can't use the shell's cd, but you can call chdir and set the working directory - and hopefully you can do that for only part of your pipeline.

If they were in fact describing implementation, then I mostly agree - it's likely better to write shell directly than generate it, at least short of treating it seriously as a compilation target.

The problem is you can’t have two different threads operating in different working directories. One “cd” would overwrite another. You could have different processes but then you’re now recreating a shell, in which case you might as well just write it in Bash (for example).
On the one hand, on Linux with the clone system call, you actually can have a "thread" that shares memory, file descriptor table, etc, but not working directory (or chroot):

        If  CLONE_FS  is  not  set,  the  child  process  works on a copy of
        the filesystem information of the calling process at the time of the
        clone() call.  Calls to chroot(2), chdir(2), or umask(2) performed later
        by one of the processes do not affect the other process.
On the other hand: what you say is true of posix threads, code built atop clone is unlikely to be portable, etc, etc.

More generally, it's very much the case that the process-global nature of the working directory makes some things tricky. On the other hand, there are ways around that.

If the only pieces that need to reference the working directory are processes you're spawning, the answer is simple - carry a description of the intended working directory through your computation, and actually chdir between the fork and the exec.

If the only pieces that need to reference the working directory are things you will be writing as a part of the current project, you can use "at variants" (openat, fstatat, unlinkat, etc).

You can stitch these two together as needed.

What's awkward is if you need to use library code that references the current working directory and does not use "at variants". In that case, you could play some awkward game with a mutex, setting the cwd only for the duration of individual operations and restoring it afterward (although that does require knowing when these operations may be performed).

Or you could fork your process along the lines you need to draw. Splitting your view of memory raises questions of IPC. In particular, if you want to pass language-specified data structures that gets complicated - particularly if they may contain references. If you can get away with treating everything as passing streams of bytes over actual pipes, it's pretty straightforward. To my mind, this is probably most of what's meant by "shell-script like", and while you're recreating part of a shell, it's actually not very much of a shell, it should be well contained in the library, and you have room for a much better story around things like error handling. I don't have a sense of whether any of the particular libraries discussed here are actually addressing exactly this issue or whether they actually do a good job of any of it.