| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sgarland 565 days ago

> it would be awesome if every Unix utility output structured data like JSON

I see this argument a lot, and I think it has a ton of overlap with the struggles from devs trying to grok RDBMS I see as a DBRE.

Most (?) people working with web apps have become accustomed to JSON, and fully embrace its nesting capabilities. It’s remarkably convenient to be able to deeply nest attributes. RDBMS, of course, are historically flat. SQL99 added fixed-size, single-depth arrays, and SQL2003 expanded that to include arbitrary nesting and size; SQL2017 added JSON. Still, the traditional (and IMO, correct) way to use RDBMS is to treat data as having relationships to other data, and to structure it accordingly. It’s challenging to do, especially when the DB providers have native JSON types available, but the reasons why you should are numerous (referential integrity, size efficiency, performance…).

Unix tooling is designed with plaintext output in mind because it’s simple, every other tool in the ecosystem understands it, and authors can rest assured that future tooling in the same ecosystem will also understand it. It’s a standard.

JSON is of course also a standard, but I would argue that on a pure CLI basis, the tooling supporting JSON as a first-class citizen (jq) is far more abstruse than, say, sed or awk. To be fair, a lot of that is probably due to the former’s functional programming paradigm, which is foreign to many.

Personally, I’m a fan of plaintext, flat output simply because it makes it extremely easy to parse with existing tooling. I don’t want to have to fire up Python to do some simple data manipulation, I want to pipe output.

1 comments

Analemma_ 565 days ago

If there were some kind of standard or widely-followed convention for Unix tools to print plaintext, I wouldn't mind it so much. It's the fact that you need to memorize a different flag and output format for each tool, followed by an awk command where the intention of the code is generally very obtuse, which bothers me. By contrast, for all its faults, the fact that PowerShell has unambiguous syntax for "select this field from the output" helps a lot with both reading and writing. e.g. to get your IP address, "Get-NetIPAddress | ? {$_.InterfaceAlias -like "Ethernet" -or $_.InterfaceAlias -like "Wi-Fi"} | select IPAddress" is a lot clearer in intent than the Unix equivalent regex soup, and it can be written without looking anything up by printing the raw output of "Get-NetIPAddress" to the shell and seeing what you need to filter/select on. You can even get tab completion to help.

A hypothetical Unix equivalent doesn't need to be JSON, I just brought that up as an example. But any structured data would be an improvement over the situation now, and as the PS example shows, with the appropriate ecosystem tooling you can do all the data manipulation over pipes.