Hacker News new | ask | show | jobs
by em-bee 414 days ago
could you go into more detail how (and maybe why) murex and elvish differ?

also, i'd like to know more about how the job control works. that's one of the pain points in elvish, but both are written in go, so maybe there are some ideas that elvish could copy.

1 comments

Murex and Elivish share a lot of similar design goals. The author of Elvish has done a lot of great talks about his approach to Elvish development and you can tell a lot of care has gone into its design.

With Murex, I initially took a "let's just experiment until I find something that works" with no fear of writing ugly proof-of-concept code. This allowed me to build a lot of stuff very quickly and originally it was written in a self-hosted git repository to solve my own problems. But as the project evolved I realised there was some good stuff in there that's worth sharing. The downside to this approach is that there is some ugliness to its design that has lasted even to the latest version due to Murex's compatibility promise. However, the latest version of Murex does provide an internally versioned runtime, which means scripts can now pin to a specific version of Murex and not worry about gradual changes over time (even if "gradual over time" in this context literally means "years of compatibility" even before the versioned runtime.

This means that Murex and Elvish might feel like very different shells despite being conceptually quite similar.

I'm a little reluctant to give specific areas where the two shells diverge because both are under active development and thus moving targets. So what might be true today might not be true tomorrow. However, I will say the syntax for each does vary significantly despite being superficially similar.

As for job control, this was part of Murex's early design because it's a feature I used heavily at the time. So the concept of background and foreground processes are weaved throughout all of the core runtime. Like with Elvish, Murex doesn't create new UNIX processes for builtins. And with commands that are forked processes, Murex doesn't hand over complete ownership of the TTY to them so that Murex can still catch the signals. The reason for the latter is because Murex can then add additional hooks to job control, such as returning a list of open files any stopped processes have opened, and how far through reading those files it is. So Murex has needed to re-implement some of the job control logic that would normally be handled by the POSIX kernel. This does result in a lot of additional code, and thus places for things to go wrong. On balance, I think I made the right tradeoff for Murex. However if I were to write an entirely new shell from the ground up, I'd probably not do it this way again.

i really like your approach to job control. my hope is that elvish can implement something similar. i am hopeful in that you already managed to overcome the challenges go introduces here, so the elvish devs can potentially take advantage of that.
The limitations here aren’t due to Go. You can define forked processes gpid and ctty, which are the two key pieces you need to define to “correctly” support job control.

And in fact Go actually makes it very easy to both catch job control signals raised by the kernel and set those aforementioned parameters when calling the fork syscall.

The real problem here is that we don’t actually want to POSIX compliant job control because that would mean builtins inside hot paths would perform significantly worse and we lose the ability to easily and efficiently share data between commands, such at type annotations, localised variables, etc.

The lack of type annotations is a particularly hard problem to solve and also the main reason to use an alternative shell like Murex or Elvish. In fact I’d say having type annotations work across commands is more important than job control.

So the end result is having to replicate a lot of what you would normally get for free in POSIX kernels, except this time running inside your shell. In places you’re basically writing kludge after kludge. But whenever I despair about the ugliness of the code I’ve written, I remind myself that this is all running on 40 year old emulation mechanical teletypes. So the whole stack is already one giant hot ball of kludges.

the challenges in go is a reference to a discussion in the elvish chat where one participant claimed that go's os.StartProces() API makes it impossible to implement unix job control with 100% fidelity.

i don't actually know what the issue there is, and maybe there is a way to avoid using os.StartProcess but the point is that murex is not implementing POSIX compliant job control. and that is one way to get around any issues that may exist.

and now having learned how murex handles job control i am happy that elvish avoided implementating POSIX job control so far because this allows rethinking how to approach this.

this is all running on 40 year old emulation mechanical teletypes

isn't that the real issue right there?

i have been wondering if it is not possible to get rid of that emulation layer and provide for a more rich way for programs to interact with the user.

we'll never be able to get rid of the emulation completely, but i wonder if the position in the stack can be moved.

right now it is:

    GUI
    GUI application that emulates a 40 yr old terminal
    shell
    programs running in the shell
how about:

    GUI
    GUI application for commandline programs that provides a rich interface
    modern shell that runs on that interface
    modern programs running in the shell
    or terminal emulation for legacy programs that need it
       legacy programs running with an emulation layer.
the emulation layer could be started by the shell as needed.

to get that emulation layer we only need to port something like tmux onto that new api. there is also a layer that implements job-control for shells that don't support it: https://github.com/yshui/job-security so this can be done without having to reimplement the emulation yet again

> the challenges in go is a reference to a discussion in the elvish chat where one participant claimed that go's os.StartProcess() API makes it impossible to implement unix job control with 100% fidelity.

That's not true. os.StartProcess() takes a pointer to https://pkg.go.dev/os#ProcAttr which then takes a pointer to https://pkg.go.dev/syscall#SysProcAttr

If I recall correctly, either Setctty or Foreground needs to be set to true (I forget which offhand, possibly Foreground, but browsing the StartProcess()'s source should reveal that.

I don't actually do that in Murex, because I want to add additional hooks to SIGSTSP and I can't do that if I hand ownership of the TTY to the child process.

https://github.com/lmorg/murex/blob/master/lang/exec_unix.go...

But that means that some tools like Helix then break job control in Murex because they don't think Murex supports job control (due to processes being marked non-traditionally). You can see higher up in that file above where I need to force Murex to take ownership of the TTY again as part of the process clean up. (line 28 onwards)

Processes invoked from a shell should also be part of a process group:

https://github.com/lmorg/murex/blob/master/lang/exec_unix.go...

You also need to set the shell to be a process session leader:

https://github.com/lmorg/murex/blob/master/shell/session/ses...

All of this is UNIX (inc Linux) specific so you can see compiler directives at the top of those files to exclude WASM, Windows, and Plan 9 builds. I don't even try to emulate job control on those platforms because it's too much effort to write, test, and maintain.

> i have been wondering if it is not possible to get rid of that emulation layer and provide for a more rich way for programs to interact with the user.

Funny enough, this is something I'm experimenting with my terminal emulator, though it's very alpha at the moment https://github.com/lmorg/Ttyphoon

> to get that emulation layer we only need to port something like tmux onto that new api

My terminal emulator does exactly this. It uses tmux control mode so that tmux manages the TTY sessions and the terminal emulator handles the rendering.

> https://github.com/yshui/job-security so this can be done without having to reimplement the emulation yet again

The problem with a 3rd party tool for job control is that it doesn't work with shell builtins. And the massive value add for shells like Murex and Elvish is their builtins. This is another reason why I didn't want POSIX job control in Murex: I wanted to keep my shell builtins powerful but also allow them to support job control in a way that feels native and transparent to the casual user.