Hacker News new | ask | show | jobs
by LeonidasXIV 854 days ago
Some alternatives is a structured data interface, a bit like PowerShell where not everything needs to be serialized and deserialized and parsed in every program in a pipeline.

Another approach is that the OS is basically a complete environment where everything is code. You can see the idea in the Smalltalk environments where you can theoretically interact with every object by sending messages. Lisp machines come to mind as well and one could even consider early personal computers that booted into BASIC as an idea of this (though in BASIC instead ov everything is a file everything is memory)

1 comments

Generally I think people are happy piping structured data around, the problem is that ties you into a particular data structure. JSON seems to be winning there, and I think people would be very happy if more commands had a JSON environment variable, or even if it was possible to output data on a /dev/stdjson pipe.

Anyone know how to add a new stdout-like interface to unix-like OSes?

> The functions defined in libxo are used to generate a choice of TEXT, XML, JSON, or HTML output. A common set of functions are used, with command line switches passed to the library to control the details of the output.

https://man.freebsd.org/cgi/man.cgi?query=libxo&sektion=3&ap....

See ps(8) for example use: https://man.freebsd.org/cgi/man.cgi?query=ps&apropos=0&sekti...

JSON is a good start, but Powershell is great in that a date is actually a date object, which means that I can do operations on that without faffing around with parsing and worrying about whether what comes in the JSON might depend on the locale of some system, or that somebody didn't take into account that I might want microseconds and truncated the timestamp.
Right, but at that point you're tying it to a universal type system.

Personally I like YAML, since you can add type-hints to data which you can use to turn simple text types into more complex types, but I can see why that standard wouldn't take off.

Most of the unix-philosophy people I know are interested in stuff like that, it just has to be implemented in a thoughtful way.

YAML is a crime.
I can see why you'd be worried about it, it's a lot more powerful than what I'd actually want for this.

But that's sort of part of the problem, no one is going to agree on a common universal data-type.

if you're going to use something JSONesque as your common language, I think you do need to include a schema
What value would a new standard pipe for JSON bring? It's already serialized text that can be sent to stdout. The real rub is getting programs to speak JSON natively.
>> The real rub is getting programs to speak JSON natively.

You mean getting programs to ingest arbitrary data structures? OK, JSON is not arbitrary - it is limited to a certain overall format. But it can be used to serialize arbitrarily complex data structures.

Meaning most Unix command line tools are line-oriented meaning input and output are expected to have a series of text records, each on its own line. Stray too far from this lingua franca and you'll need to get creative to get piped IO to work.

JSON doesn't care so much about lines and doesn't necessarily represent an array of records, so it doesn't fit into this box.

I think it would be cool if stdout worked something like the clipboard, where you have different representations, and when an application copies something from the clipboard it selects the representation it wants. I'm not sure how to avoid making it terribly wasteful though, so that the producing application doesn't have to write in every format possible.
Content negotiation.

Imagine a magical `ls` that can emit text/plain, application/json, what have you:

  ls -f text
  ls -f json
  ls -f csv
  ls -f msgpack
  ...
Now instead of specifying formats on both ends:

  ls | jq             # jq accepts application/json => ls outputs as json
  ls | fq             # fq accepts a ton of stuff => "best fit" would be picked
  ls | fq -d msgpack  # fq accepts only msgpack here => msgpack
  ls                  # stdout is tty, on the other end is the terminal who says what it accepts => human readable output
Essentially upon opening a pipe a program would be able to say what they can accept on the read end and what they can produce on the write end. If they can agree on a common in-memory binary format they can literally throw structs over the fence - even across languages, FFI style - no serialisation required, possibly zero-copy.

We know how to do that:

- https://www.rfc-editor.org/rfc/rfc2616#page-71

- https://www.rfc-editor.org/rfc/rfc2616#page-100

And I mean, we really know: the last one we already do! Tons of programs check for stdin and/or stdout being tty via an ioctl and change their processing based on that.

It'd allow a bunch of interesting stuff, like:

- `cat` would write application/octet-stream and the terminal would be aware that raw binary is being cat'd to its tty and thus ignore escape codes, while a program aiming to control the tty would declare writing application/tty or something.

- progressive enhancement: negotiation-unaware programs (our status quo) would default to text/plain (which isn't any more broken that the current thing) or some application/unix-pipe or something.

- when both ends fail to negotiate it would SIGPIPE and yell at you. same for encoding: no more oops utf8 one end, latin1 on the other.

Interesting post! when prototyping what would end up being fq i did quite a lot of tinkering with how it could work by splitting it up into multiple tools, and use it like: <decode tool> | <query tool> | <display tool>. Never got it to feel very neat and problem is what would be piped? i tried JSON so that <query tool> could jq but whatever is would be would have to be quite verbose to be able for <display tool> to be able to show hexdump dumps, value mappings, descriptions etc. So in the end i ended up doing more or less the same design but in jq where the values piped between filters is kind of a "polymorphic" JSON values. Those behave behave like JSON value but can be queried for which bit range and buffer they originate from or if the value is symbolic representation, description etc. Maybe an interesting note about fq is that for format like msgpack it kinds of can decode in two modes, by default it decodes msgpack into like a "meta"-mode where things like integer encoding, sizes etc can be seen and then there is another "representation"-mode that is JSON value of what it represents.
That's not an improvement. Have you heard of "cat -v considered harmful?"

http://harmful.cat-v.org/cat-v/

I think we've all heard of it, but I for one don't find it persuasive.