Hacker News new | ask | show | jobs
by codedokode 1320 days ago
I think that JSON is a bad choice here.

It is obvious that CLI commands should produce machine-readable output because they are often used in scripts, and accept machine-readable input as well. Using arbitrary text output was a mistake because it is difficult to parse, especially when spaces and non-ASCII characters are present.

A good choice would be a format that is easily parsed by programs but still readable by the user. JSON is a bad choice here because it is hard to read.

In my opinion, something formatted with pipes, quotes and spaces would be better:

    eth0:
      ip: 127.15.34.23
      flags: BROADCAST|UNICAST
      mtu: 1500
      name: """"Gigabit" by Network Interfaces Inc."""
Note that the format I have proposed here is machine-readable, somewhat human-readable and somewhat parseable by line-oriented tools like grep. Therefore there might be no need for switches to choose output format. It is also relatively easy to produce without any libraries.

Regarding idea to output data in /proc or /sys in JSON format, I think this is wrong as well. This would mean that reading data about multiple processes would require lot of formatting and parsing JSON. Instead or parsing /proc and /sys directly, applications should use libraries distributed with kernel, and reading the data directly should be discouraged. Because currently /proc and /sys are just a kind of undocumented API.

Also, I wanted to note that I dislike jq utility. Instead of using JSONPath it uses some proprietary query format that I constantly fail to remember.

9 comments

Some alternative ideas for making JSON more readable:

- Pipe into gron (https://github.com/tomnomnom/gron) to get a `foo.bar.baz = val` kind of syntax.

- Pipe into visidata (https://www.visidata.org/) to get a spreadsheet-like editable view.

Hi there - `jc` author here. `jc` can also output in YAML format with the `-y` flag. It is fairly trivial to add other options in the future since `jc` just turns the text into objects which can be serialized to many different formats.

For example:

    % jc -y date
    ---
    year: 2022
    month: Nov
    month_num: 11
    day: 3
    weekday: Thu
    weekday_num: 4
    hour: 9
    hour_24: 9
    minute: 0
    second: 22
    period: AM
    timezone: PDT
    utc_offset:
    day_of_year: 307
    week_of_year: 44
    iso: '2022-11-03T09:00:22'
    epoch: 1667491222
    epoch_utc:
    timezone_aware: false
Great, by I would prefer to avoid YAML because it is very complicated and difficult to parse.
> In my opinion, something formatted with pipes, quotes and spaces would be better:

Just pipe it into a JSON-to-YAML script like this:

    #! /usr/bin/python3
    from ruamel import yaml
    import json, sys, io
    print(yaml.dump(json.load(sys.stdin)))
Following the project documentation, you easily come to:

  jc dig example.com | jq       
  [
    {
      "id": 30081,
      "opcode": "QUERY",
      "status": "NOERROR",
      "flags": [
        "qr",
        "rd",
        "ra"
      ],
      "query_num": 1,
      "answer_num": 1,
      "authority_num": 0,
      "additional_num": 1,
      "opt_pseudosection": {
        "edns": {
          "version": 0,
          "flags": [],
          "udp": 4096
        }
      },
      "question": {
        "name": "example.com.",
        "class": "IN",
        "type": "A"
      },
      "answer": [
        {
          "name": "example.com.",
          "class": "IN",
          "type": "A",
          "ttl": 56151,
          "data": "93.184.216.34"
        }
      ],
      "query_time": 0,
      "server": "192.168.1.254#53(192.168.1.254)",
      "when": "Thu Nov 03 14:06:40 CET 2022",
      "rcvd": 56,
      "when_epoch": 1667480800,
      "when_epoch_utc": null
    }
  ]
Rather readable to my mind. And you can rather easily transform it to your preferred human readable output format I guess.
For me there are too many quotes and brackets. My proposed format can also be converted to JSON if necessary.
Also note that you are looking at plaintext output here. By default `jc` and other JSON filtering tools do syntax highlighting when outputting to the terminal so it's actually quite easy to read JSON these days.
> A good choice would be a format that is easily parsed by programs but still readable by the user.

I think the powershell approach is a good one here too: powershell commands output binary streams of objects rather than text and it is powershell itself that has several standard ways of human readable outputs, most of which are automatic (but easily tweaked with an extra pipe or two). Standard human readable forms are nice, and even standardized there's no need to rely on parsing them back out into objects because they are already passed as objects so they can focus a bit more on "pretty" over "parse-able" (such as including human useful things like ellisions `…` on long columns).

Not binary streams of serialized objects, but arrays of pointers to live objects in one process's memory. No serialization / deserialization, binary or text, or piping between processes, just passing pointers to live objects between cmdlets in the same address space. That's quite different and vastly more efficient than serializing and deserializing text or binary data between every step in different processes connected by pipes.

https://en.wikipedia.org/wiki/PowerShell#Pipeline

>As with Unix pipelines, PowerShell pipelines can construct complex commands, using the | operator to connect stages. However, the PowerShell pipeline differs from Unix pipelines in that stages execute within the PowerShell runtime rather than as a set of processes coordinated by the operating system. Additionally, structured .NET objects, rather than byte streams, are passed from one stage to the next. Using objects and executing stages within the PowerShell runtime eliminates the need to serialize data structures, or to extract them by explicitly parsing text output. An object can also encapsulate certain functions that work on the contained data, which become available to the recipient command for use. For the last cmdlet in a pipeline, PowerShell automatically pipes its output object to the Out-Default cmdlet, which transforms the objects into a stream of format objects and then renders those to the screen.

The thing to understand with PowerShell is that the way it pipelines objects is enabled by the fact that it is all happening in-process within one .net runtime. It is significantly more difficult to achieve anything similar with several independent processes being piped together
Well, yes, powershell takes some shortcuts and has the advantage that .NET has a strong object system.

If you were to build it from scratch with the idea of "shared nothing" applications similar to the unix model with text files, it's not that much more difficult with just about any sort of object or message broker. You could easily imagine a world with a dbus based "REPL"/shell, for instance. Or a different approach easily imaginable if you still want to focus on unix-style streams/files between processes would be something like BSON streams (thought it would still have some serialization/deserialization overhead).

I also dislike jq, but this is a bit of a non issue IMHO. You could in theory add any kind of output transformer in theory. The codebase doesn't seem to be optimized for that yet, but it should be trivial to add.
If you use something other than JSON you'd have to wait until every app you want to use chooses to update to support your preferred format. That might take a while. Wouldn't it be better to use JSON for the output as that's an acceptable input to lots and lots of applications already, and if you want to read the output just pass it to an app that converts from JSON to "something formatted with pipes, quotes and spaces".
On the other hand, if machine-readable format is adopted, it will be used for many years or even decades. So instead of making a quick hack that everybody will regret later it might be better to spend some time comparing different options.
JSON is the lazy choice. I particularly dislike quoting keys (variable names). Relaxed JSON, for one, allows unquoted keys http://www.relaxedjson.org But that's just a small step - I am sure we (the community) could do better.
>> something formatted with pipes, quotes and spaces would be better

How well would this format handle deeply nested structures? It seems like it would require a lot of space characters compared to nesting open and close characters: {} or () or []

How would escaping pipes, quotes, and spaces work to represent those character literals?

There are already numerous structured text formats: JSON, XML, S-expressions, YAML, TOML, EDN, and many more. Wouldn't this be yet another format? (https://xkcd.com/927/)

Dare I suggest Jevko[0] as yet another alternative?

  eth0 [
    ip [127.15.34.23]
    flags [[BROADCAST][UNICAST]]
    mtu [1500]
    name ["Gigabit" by Network Interfaces Inc.]
  ]
This is one of the things it was designed with in mind.

It's even simpler and more flexible than S-expressions.

Handles deeply nested structures perfectly well. Has only 3 characters to escape (brackets and the escape character).

(I am the author)

[0] https://jevko.org/

How do you differentiate types with jevko (numbers, strings, boolean)? Your examples on jevko.org appear lossy as they encode in the same way things that are different in JSON and I don't know how you would then differentiate between true and "true", 27 and "27", etc.
A plain Jevko parser simply turns your unicode sequence into a tree which has its fragments as leaves/labels.

No data types on that level, much like in XML.

Now above that level there is several ways to differentiate between them.

The simplest pragmatic way is a kind of type inference: if a text parses as a number, it's a number, if it's "true" or "false", it's a boolean. Otherwise it's a string. If you know the implicit schema of your data then this will be sufficient to get the job done.

Otherwise you employ a separate schema -- JC in particular has per-parser schemas anyway, so that's covered in this case. If it wouldn't, you'd need to write a schema yourself.

Or you do "syntax-driven" data types, similar to JSON, e.g. strings start w/ "'".

Here is a shitty demo: https://jevko.github.io/interjevko.bundle.html

It shows schema inference from JSON and the schemaless (syntax-driven) flavor.

Jevko itself is stable and formally specified: https://github.com/jevko/specifications/blob/master/spec-sta...

It's very easy to write a parser in any language (I've written one in several) and from there start using it.

However, I am still very much working on specifications for formats above Jevko. I have some recent implementations of the simplest possible format which converts Jevko to arrays/objects/strings:

* https://github.com/jevko/easyjevko.lua

* https://github.com/jevko/easyjevko.js

The schema-driven format that was used in the demo is implemented here:

* https://github.com/jevko/interjevko.js

* https://github.com/jevko/jevkoschema.js