Hacker News new | ask | show | jobs
by rbanffy 216 days ago
I have a distaste for the verboseness of PowerShell, but I also have concerns with the attempt to bake in complex objects into the pipeline. When you do that, programs up and down the stream need to be aware of that - and that makes it brittle.

One key aspect of the Unix way is that the stream is of bytes (often interpreted as characters) with little to no hint as to what's inside it. This way, tools like `grep` and `awk` can be generic and work on anything while others such as `jq` can specialize and work only on a specific data format, and can do more sophisticated manipulation because of that.

2 comments

> I have a distaste for the verboseness of PowerShell, but I also have concerns with the attempt to bake in complex objects into the pipeline. When you do that, programs up and down the stream need to be aware of that - and that makes it brittle.

Yeah, you definitely can't write tools for the unix shell that assume some kind of self-describing message encoding. I mean, you could, but you'd have to do a lot of work to wrap it so that it can work with unix byte streams at the edges. I believe oil shell and nushell have prior art on this. To your point, it should be telling that those are shells of their own, rather than tools for existing unix shells.

> One key aspect of the Unix way is that the stream is of bytes (often interpreted as characters) with little to no hint as to what's inside it. This way, tools like `grep` and `awk` can be generic and work on anything while others such as `jq` can specialize and work only on a specific data format, and can do more sophisticated manipulation because of that.

This seems backwards to me. grep and awk are extremely fragile because they have to look at what's inside. They have to read every byte, and the user of grep and awk must understand entirely what the incoming data will be.

Whereas with PowerShell or any other system with self-describing messages, the user makes some lightweight assertions about the abstract shape of the data--not the concrete shape of that data's representation.

> When you do that, programs up and down the stream need to be aware of that - and that makes it brittle.

You can go from object- to good old text processing with *nix tools no problem.

Instead of using 100% PowerShell to count all lines in the text files:

  gc *.txt | measure | select -ExpandProperty count
you can switch to `wc` if you like:

  gc *.txt | wc -l
`gc` is Get-Content – basically cat. You can also use awk, sed, jq etc.
What GP talks about is illustrated with the following modification of your example:

  PS /home/user> gc *.txt | measure | cat | select -ExpandProperty Count
  Select-Object: Property "Count" cannot be found.
  Select-Object: Property "Count" cannot be found.
  Select-Object: Property "Count" cannot be found.
  Select-Object: Property "Count" cannot be found.
  Select-Object: Property "Count" cannot be found.
  Select-Object: Property "Count" cannot be found.
  Select-Object: Property "Count" cannot be found.
  Select-Object: Property "Count" cannot be found.
  Select-Object: Property "Count" cannot be found.
Essentially at every step you need to consider whether the preceding command outputs objects or not. This isn't the case with the Unix way. The pipeline always carries a stream of bytes. You only have to consider how to interpret that stream.
> Essentially at every step you need to consider whether the preceding command outputs objects or not.

This is true, no matter what language, paradigm, or even universe you're in; data that gets passed into a pipeline needs to have the abstract shape that the pipeline expects. This is always true, and it's every bit as true of unix byte streams.

You can of course have pipelines that try to coerce data or assert its structure. The PowerShell example you showed does the latter, and raises an error message that the assertion failed.

Unix byte streams do neither. There's no coercion, no assertion. Just blind trust. When you have IFS set incorrectly, you simply get a wrong answer. When you grab the wrong field number with cut or awk, you get a wrong (or empty) answer. The input data matters every bit as much with unix as it does with every other system of computation. What changes are characteristics like brittleness and enforceability of invariants.

jcgl already answered this well. My example was merely to show that you can go from PS objects to old fashioned text processing without issues. Any non-PS tool will just receive the text representation of the objects – exactly as you see them in the terminal. I wasn't trying to imply you can go back and forth between them like magic (which should be obvious).

Once you have lost the objects and work with simple text, you have to use the text processing tools of PS, if that's what you want to do. To continue your example:

  gc *.txt | measure | cat | sls count 
sls (Select-String) is like grep.

(Note: this example is nonsense, just to show that it works)