| HN Mirror

> A good binary format would be best, but JSON is already a step up - at least it's obvious where each value starts and ends. That said, maybe it wouldn't be too hard to offer a binary-serialized JSON format as well (I think BSON is the currently widespread standard)?

You can already use BSON. The data is piped in whatever serialisation format it's typed as but the type information is also sent. Builtins then use generic APIs that wrap around STDIN et al which are aware of the underlying serialisation.

So the following works the same regardless of whether example.data is a JSON file, BSON, YAML, TOML or whatever else:

  open example.data | foreach i { out "$i[name] lives at $i[address] }

The issue is when you want to convert tabulated data like a CSV into JSON (or similar) since you're not just mapping the same structure to a different document syntax (like with JSON, YAML, BSON, etc), you're restructuring data to an entirely new schema. I haven't yet found a reliable way to solve that problem.

> On a related note, I wonder if and how a pipe could handle "format negotiation" between processes? I.e. is there a way for a CLI app to indicate it can consume and produce structured binary data? Then the piping layer could let compatible apps talk through an efficient protocol, and for anyone else, it would automatically drop to equivalent JSON (and then maybe binarize it back up, if the next thing in the pipeline can handle it).

That isn't that far removed from how murex already works. Supported tools can use common APIs to convert the STDIN into memory structures, and similarly convert them back to their serialisation file formats. So if you have a tool like `cat` in your pipeline, they can use the pipe as a standard POSIX byte stream, but murex-aware software can treat the pipeline as structured data. The drawback of this is if you're reading from a POSIX pipe into a murex-command, you might need to add casting information (see below). But the benefit is you're not throwing away 40 years of CLI development:

  # Using a POSIX tool to read the data file:
  # casting is needed so `foreach` knows to iterate through a JSON object

  cat example.json | cast json | foreach { ... }



  # Using a murex tool to read the data file:
  # no casting is needed because `open` passes that type information down the pipe

  open example.json | foreach { ... }

(`open` here isn't doing anything clever, it just "detects" the JSON file based on the file extension -- or Content-Type header if the file is a HTTP URI)