Hacker News new | ask | show | jobs
by hnlmorg 1845 days ago
Yeah, I do understand where you're coming from and I've spent a lot of time considering how I'd re-architect murex to pass raw data across (like Powershell) rather than marshalling and unmarshalling data each side of each pipe.

In the end I settled on the design I have because it retains compatibility with the old world while enabling the features of the new world but it also behaves in a predictable way so it's (hopefully) easy for people to reason about. Powershell (and other languages in REPL like Python, LISP, etc) still exists for those who want a something that's ostensibly a programming environment first and a command line second and I think trying to compete with the excellent work there wouldn't be sensible given how mature those solutions already are. But for a lot of people, the majority of their command line usage is just chaining existing commands together and parsing text files. Often they want something terser than $LANG as a lot of command lines are read-once write-many and thus are happy to sacrifice a little in language features for the sake of command line productivity. This is the approach murex takes. Albeit murex does also try to retain readability despite being succint (which is probably the biggest failing of POSIX shells in the modern era).

What I've built is definitely not going to be everyone's preferred solution, that's for sure. But it works for me and its open source so hopefully others find it as useful as I do :)

1 comments

A good binary format would be best, but JSON is already a step up - at least it's obvious where each value starts and ends. That said, maybe it wouldn't be too hard to offer a binary-serialized JSON format as well (I think BSON is the currently widespread standard)?

On a related note, I wonder if and how a pipe could handle "format negotiation" between processes? I.e. is there a way for a CLI app to indicate it can consume and produce structured binary data? Then the piping layer could let compatible apps talk through an efficient protocol, and for anyone else, it would automatically drop to equivalent JSON (and then maybe binarize it back up, if the next thing in the pipeline can handle it).

> A good binary format would be best, but JSON is already a step up - at least it's obvious where each value starts and ends. That said, maybe it wouldn't be too hard to offer a binary-serialized JSON format as well (I think BSON is the currently widespread standard)?

You can already use BSON. The data is piped in whatever serialisation format it's typed as but the type information is also sent. Builtins then use generic APIs that wrap around STDIN et al which are aware of the underlying serialisation.

So the following works the same regardless of whether example.data is a JSON file, BSON, YAML, TOML or whatever else:

  open example.data | foreach i { out "$i[name] lives at $i[address] }
The issue is when you want to convert tabulated data like a CSV into JSON (or similar) since you're not just mapping the same structure to a different document syntax (like with JSON, YAML, BSON, etc), you're restructuring data to an entirely new schema. I haven't yet found a reliable way to solve that problem.

> On a related note, I wonder if and how a pipe could handle "format negotiation" between processes? I.e. is there a way for a CLI app to indicate it can consume and produce structured binary data? Then the piping layer could let compatible apps talk through an efficient protocol, and for anyone else, it would automatically drop to equivalent JSON (and then maybe binarize it back up, if the next thing in the pipeline can handle it).

That isn't that far removed from how murex already works. Supported tools can use common APIs to convert the STDIN into memory structures, and similarly convert them back to their serialisation file formats. So if you have a tool like `cat` in your pipeline, they can use the pipe as a standard POSIX byte stream, but murex-aware software can treat the pipeline as structured data. The drawback of this is if you're reading from a POSIX pipe into a murex-command, you might need to add casting information (see below). But the benefit is you're not throwing away 40 years of CLI development:

  # Using a POSIX tool to read the data file:
  # casting is needed so `foreach` knows to iterate through a JSON object

  cat example.json | cast json | foreach { ... }



  # Using a murex tool to read the data file:
  # no casting is needed because `open` passes that type information down the pipe

  open example.json | foreach { ... }
(`open` here isn't doing anything clever, it just "detects" the JSON file based on the file extension -- or Content-Type header if the file is a HTTP URI)