> CLI tool and python library that converts the output of popular command-line tools and file-types to JSON or Dictionaries. This allows piping of output to tools like jq and simplifying automation scripts
Take a plain text table, convert it into an SQL table and run an SQL query:
» ps aux | select USER, count(*) GROUP BY USER
USER count(*)
_installcoordinationd 1
_locationd 4
_mdnsresponder 1
_netbios 1
_networkd 1
_nsurlsessiond 2
_reportmemoryexception 1
_softwareupdate 3
_spotlight 5
_timed 1
_usbmuxd 1
_windowserver 2
lmorg 349
root 134
The builtins usually print human readable output when STDOUT is a TTY, or JSON (or JSONlines) when the TTY is a pipe.
» fid-list:
FID Parent Scope State Run Mode BG Out Pipe Err Pipe Command Parameters
590 0 0 Executing Normal no out err fid-list (subject to change)
» fid-list: | cat
["FID","Parent","Scope","State","RunMode","BG","OutPipe","ErrPipe","Command","Parameters"]
[615,0,0,"Executing","Normal",false,"out","err",{},"(subject to change) "]
[616,0,0,"Executing","Normal",false,"out","err",{},"cat"]
and you can reformat to other data types, eg
» fid-list: | format csv
FID,Parent,Scope,State,RunMode,BG,OutPipe,ErrPipe,Command,Parameters
703,0,0,Executing,Normal,false,out,err,map[],(subject to change)
704,0,0,Executing,Normal,false,out,err,map[],csv
and query data within those data structures using tools that are aware of that structural format. Eg Github's API returns a JSON object and we can filter through it to return just the issue ID's and titles:
» open https://api.github.com/repos/lmorg/murex/issues | foreach issue { printf "%2s: %s\n" $issue[number] $issue[title] }
348: Potential regression bug in `fg`
347: Version 2.2 Release
342: Install on Fedora 34 fails (issue with `go get` + `bzr`)
340: `append` and `prepend` should `ReadArrayWithType`
318: Establish a testing framework that can work against the compiled executable, sending keystrokes to it
316: struct elements should alter data type to a primitive
311: No autocompletions for `openagent` currently exist
310: Supprt for code blocks in not: `! { code }`
308: `tabulate` leaks a zero length string entry when used against `rsync --help`
As much as this is an improvement in many ways, using JSON for this feels like it doubles down on part of the problem with the current standard. Every tool is rendering JSON only for the next tool in the pipeline to parse it back out.
Yeah, I do understand where you're coming from and I've spent a lot of time considering how I'd re-architect murex to pass raw data across (like Powershell) rather than marshalling and unmarshalling data each side of each pipe.
In the end I settled on the design I have because it retains compatibility with the old world while enabling the features of the new world but it also behaves in a predictable way so it's (hopefully) easy for people to reason about. Powershell (and other languages in REPL like Python, LISP, etc) still exists for those who want a something that's ostensibly a programming environment first and a command line second and I think trying to compete with the excellent work there wouldn't be sensible given how mature those solutions already are. But for a lot of people, the majority of their command line usage is just chaining existing commands together and parsing text files. Often they want something terser than $LANG as a lot of command lines are read-once write-many and thus are happy to sacrifice a little in language features for the sake of command line productivity. This is the approach murex takes. Albeit murex does also try to retain readability despite being succint (which is probably the biggest failing of POSIX shells in the modern era).
What I've built is definitely not going to be everyone's preferred solution, that's for sure. But it works for me and its open source so hopefully others find it as useful as I do :)
A good binary format would be best, but JSON is already a step up - at least it's obvious where each value starts and ends. That said, maybe it wouldn't be too hard to offer a binary-serialized JSON format as well (I think BSON is the currently widespread standard)?
On a related note, I wonder if and how a pipe could handle "format negotiation" between processes? I.e. is there a way for a CLI app to indicate it can consume and produce structured binary data? Then the piping layer could let compatible apps talk through an efficient protocol, and for anyone else, it would automatically drop to equivalent JSON (and then maybe binarize it back up, if the next thing in the pipeline can handle it).
> A good binary format would be best, but JSON is already a step up - at least it's obvious where each value starts and ends. That said, maybe it wouldn't be too hard to offer a binary-serialized JSON format as well (I think BSON is the currently widespread standard)?
You can already use BSON. The data is piped in whatever serialisation format it's typed as but the type information is also sent. Builtins then use generic APIs that wrap around STDIN et al which are aware of the underlying serialisation.
So the following works the same regardless of whether example.data is a JSON file, BSON, YAML, TOML or whatever else:
open example.data | foreach i { out "$i[name] lives at $i[address] }
The issue is when you want to convert tabulated data like a CSV into JSON (or similar) since you're not just mapping the same structure to a different document syntax (like with JSON, YAML, BSON, etc), you're restructuring data to an entirely new schema. I haven't yet found a reliable way to solve that problem.
> On a related note, I wonder if and how a pipe could handle "format negotiation" between processes? I.e. is there a way for a CLI app to indicate it can consume and produce structured binary data? Then the piping layer could let compatible apps talk through an efficient protocol, and for anyone else, it would automatically drop to equivalent JSON (and then maybe binarize it back up, if the next thing in the pipeline can handle it).
That isn't that far removed from how murex already works. Supported tools can use common APIs to convert the STDIN into memory structures, and similarly convert them back to their serialisation file formats. So if you have a tool like `cat` in your pipeline, they can use the pipe as a standard POSIX byte stream, but murex-aware software can treat the pipeline as structured data. The drawback of this is if you're reading from a POSIX pipe into a murex-command, you might need to add casting information (see below). But the benefit is you're not throwing away 40 years of CLI development:
# Using a POSIX tool to read the data file:
# casting is needed so `foreach` knows to iterate through a JSON object
cat example.json | cast json | foreach { ... }
# Using a murex tool to read the data file:
# no casting is needed because `open` passes that type information down the pipe
open example.json | foreach { ... }
(`open` here isn't doing anything clever, it just "detects" the JSON file based on the file extension -- or Content-Type header if the file is a HTTP URI)
Protobufs requires each end of the comms agreeing to the same schema. You'd need something that transmitted key names like JSON, YAML, TOML etc. If you wanted a binary format then you could send BSON (binary JSON), and murex does already support this. But pragmatically a standard command line (or even your average shell script) isn't going to be consuming the kind of data that is going to be latency heavy to the extent that the difference between JSON or BSON would impact the bandwidth of a pipe.
Worst case scenario and you're dealing with gigabytes or more of data, then you'd a streamable format like jsonlines where each command in a pipeline can run concurrently without waiting for the file to EOF before processing it. In those situations most binary serialisations aren't well optimised to solve.
I like this approach:
https://github.com/kellyjonbrazil/jc
> CLI tool and python library that converts the output of popular command-line tools and file-types to JSON or Dictionaries. This allows piping of output to tools like jq and simplifying automation scripts