Hacker News new | ask | show | jobs
by Joker_vD 815 days ago
There were several reasons why pipes were added to Unix, and the ability to run producer/consumer processes concurrently was one of them. Before that (and for many years after on non-Unix systems) indeed the most prevalent paradigm were to run multi-stage pipelines with the moral equivalent of the following:

    stage1.exe /in:input.dat /out:stage1.dat
    stage2.exe /in:stage1.dat /out:stage2.dat
    del stage1.dat
    stage3.exe /in:stage2.dat /out:result.dat
    del stage2.dat
2 comments

Pipes are so useful. I find myself more and more using shell script and pipes for complex multi-stage tasks. This also simplifies any non-shell code I must write, as there are already high quality, performant implementations of hashing and compression algorithms I can just pipe to.
My biggest annoyance is when I get some tooling from some other team, and they're like "oh just extend this Python script". It'll operate on local files, using shell commands, in a non-reentrant way, with only customization from commenting out code. Maybe there's some argparse but you end up writing a program using their giant args as primitives.

Guys just write small programs and chain them. The wisdom of the ancients is continuously lost.

Python comes with a built-in module called fileinput that makes this very easy. It checks sys.argv[1] and reads from it or from stdin if it's empty or a dash.

https://docs.python.org/3/library/fileinput.html

I would recommend the python sh module instead of writing bash for more complex code. Python’s devenv and tooling is way more mature and safer.
It's just a preference thing, I loathe the small program chaining style and cannot work with it at all. Give me a python script and I'm good though. I can't for the life of me imagine why people would want to do pseudo programming through piping magic when chaining is so limited compared to actual programming
This is of course a false dichotomy, there's nothing pseudo about using bash (perhaps you mean sudo?) and bash scripts orchestrate what you call 'actual' programs.

I commonly write little python scripts to filter logs, which I have read from stdin. That means I can filter a log to stdout:

   cat logfile.log | python parse_logs.py
Or filter them as they're generated:

   tail -f logfile.log | python parse_logs.py
Or write the filtered output to a file:

   cat logfile.log | python parse_logs.py > filtered.log 
Or both:

   tail -f logfile.log | python parse_logs.py | tee filtered.log
It would be possible, I suppose, to configure a single python script to do all those things, with flags or whatever.

But who on Earth has the time for that?

Chaining pipes in python is quite obnoxious.
"The programmer scoffed at Master Foo and rose to depart. But Master Foo nodded to his student Nubi, who wrote a line of shell script on a nearby whiteboard, and said: “Master programmer, consider this pipeline. Implemented in pure C, would it not span ten thousand lines?”"

http://catb.org/~esr/writings/unix-koans/ten-thousand.html

Ugh. I don’t feel that the spirit of those satirical Zen Koans is to be so self-congratulatory.
What programming language do you use where there isn't performant hashing/compression algorithms implemented as libraries?
Well they all do, but in terms of ease of use, tar and zip are much simpler to implement in a cli pipeline than to write bespoke code. At least that has been my experience.
It is hard to compete with "| gzip" in any programming language. Just importing a library and you're already well past that. Just typing "import" and you're tied! Overbudget if I drop the space in "| gzip".

This is one of the reasons why, for all its faults, shell just isn't going anywhere any time soon.

It is hard to compete with.

You can also (assuming your language supports it), execute gzip, and assuming your language gives you some writable-handle to the pipe, then write data into it. So, you get the concurrency "for free", but you don't have to go all the way to "do all of it in process".

I've also done the "trick" of executing [bash, -c, <stuff>] in a higher language, too. I'd personally rather see the work better suited for the high language done in the higher language, but if shell is easier, then as such it is.

It's sort of like unsafe blocks: minimize the shell to a reasonable portion, clearly define the inputs/outputs, and make sure you're not vulnerable to shell-isms, as best as you can, at the boundary.

But I still think I see the reverse far more often. Like, `steam` is … all the time, apparently … exec'ing a shell to then exec … xdg-user-dir? (And the error seems to indicate that that's it…) Which seems more like the sort of "you could just exec this yourself?". (But Steam is also mostly a web-app of sorts, so, for all I know there's JS under there, and I think node is one of those "makes exec(2) hard/impossible" langs.)

import os import os.subprocess #is that right? subprocess.execute(f'tar cvzf t.tar.gz {' '.join(list_of_files)}')

Did I do that right?

or was it

`tar cvzf t.tar.gz *`

Sometimes you want the intermediate files as well, though. For example, if doing some kind of exploratory analysis of the different output stages of the pipeline, or even just for debugging.

Tee can be useful for that. Maybe pv (pipe viewer) too. I have not tried it yet.