Hacker News new | ask | show | jobs
by crimsonalucard5 2192 days ago
Unless you use some stateful program or you're writing to a file typically a unix expression that uses purely reads and pipes is deterministic meaning you run the same expression twice you get the same result.

If you want to get pedantic fine, but the majority of use cases is stateless.

1 comments

It's kind of an interesting pattern to think of having the pipeline as a whole be purely deterministic, but its initial input be something from the environment (kind of like the Haskell IO monad, right?).

Probably most complex shell pipelines follow this pattern in practice, but the first counterexample I thought of was shuf, or sort -R. Also, many pipelines that include any kind of interpolation or variable substitution also don't follow it.

A different insight about reasoning about pipelines might be that the byte string (or ASCII string) type is too weak to catch most kinds of errors, and it's not uncommon for one pipeline component to not, in fact, be a total function with respect to the actual data type that you're trying to capture (I don't know the right terminology for this). This most famously happens when you do regular expression substitutions but your regular expression doesn't actually match the full grammar that you're looking for. Then the pipeline can be incorrect as a whole for some inputs, but earlier and later stages don't notice. It can also happen anywhere that different tools have a different implicit understanding of the relevant grammar or structure.

That connects up with all sorts of other ideas which are actually about type safety and parsing more than FP. For example, PowerShell has tried to take the pipeline concept in a different direction, to the consternation of us Unix purists. Its use of typed objects in this context makes more explicit what the contract between programs in the pipeline is supposed to be. There is also a LANGSEC connection in terms of the risks of informal or underspecified parsers and grammars.

I know I've personally written lots of Unix pipelines that were correct for all the inputs that I personally threw at them, but definitely not correct for every possible input. I like to use ! in vim frequently to shell out to a command line to perform a text editing task, and vim has an associated undo and redo which means that sometimes I'm trying several variants until I find the one that successfully appears to do the edit that I intended. Sometimes I do this at an almost preconscious level, which is really strange in terms of thinking about the concept of the correctness of a program (in this case, where the program is literally only going to be used once, with the programmer looking over its shoulder).