Hacker News new | ask | show | jobs
by jakogut 824 days ago
Pipes are so useful. I find myself more and more using shell script and pipes for complex multi-stage tasks. This also simplifies any non-shell code I must write, as there are already high quality, performant implementations of hashing and compression algorithms I can just pipe to.
3 comments

My biggest annoyance is when I get some tooling from some other team, and they're like "oh just extend this Python script". It'll operate on local files, using shell commands, in a non-reentrant way, with only customization from commenting out code. Maybe there's some argparse but you end up writing a program using their giant args as primitives.

Guys just write small programs and chain them. The wisdom of the ancients is continuously lost.

Python comes with a built-in module called fileinput that makes this very easy. It checks sys.argv[1] and reads from it or from stdin if it's empty or a dash.

https://docs.python.org/3/library/fileinput.html

I would recommend the python sh module instead of writing bash for more complex code. Python’s devenv and tooling is way more mature and safer.
It's just a preference thing, I loathe the small program chaining style and cannot work with it at all. Give me a python script and I'm good though. I can't for the life of me imagine why people would want to do pseudo programming through piping magic when chaining is so limited compared to actual programming
This is of course a false dichotomy, there's nothing pseudo about using bash (perhaps you mean sudo?) and bash scripts orchestrate what you call 'actual' programs.

I commonly write little python scripts to filter logs, which I have read from stdin. That means I can filter a log to stdout:

   cat logfile.log | python parse_logs.py
Or filter them as they're generated:

   tail -f logfile.log | python parse_logs.py
Or write the filtered output to a file:

   cat logfile.log | python parse_logs.py > filtered.log 
Or both:

   tail -f logfile.log | python parse_logs.py | tee filtered.log
It would be possible, I suppose, to configure a single python script to do all those things, with flags or whatever.

But who on Earth has the time for that?

Chaining pipes in python is quite obnoxious.
"The programmer scoffed at Master Foo and rose to depart. But Master Foo nodded to his student Nubi, who wrote a line of shell script on a nearby whiteboard, and said: “Master programmer, consider this pipeline. Implemented in pure C, would it not span ten thousand lines?”"

http://catb.org/~esr/writings/unix-koans/ten-thousand.html

Ugh. I don’t feel that the spirit of those satirical Zen Koans is to be so self-congratulatory.
What programming language do you use where there isn't performant hashing/compression algorithms implemented as libraries?
Well they all do, but in terms of ease of use, tar and zip are much simpler to implement in a cli pipeline than to write bespoke code. At least that has been my experience.
It is hard to compete with "| gzip" in any programming language. Just importing a library and you're already well past that. Just typing "import" and you're tied! Overbudget if I drop the space in "| gzip".

This is one of the reasons why, for all its faults, shell just isn't going anywhere any time soon.

It is hard to compete with.

You can also (assuming your language supports it), execute gzip, and assuming your language gives you some writable-handle to the pipe, then write data into it. So, you get the concurrency "for free", but you don't have to go all the way to "do all of it in process".

I've also done the "trick" of executing [bash, -c, <stuff>] in a higher language, too. I'd personally rather see the work better suited for the high language done in the higher language, but if shell is easier, then as such it is.

It's sort of like unsafe blocks: minimize the shell to a reasonable portion, clearly define the inputs/outputs, and make sure you're not vulnerable to shell-isms, as best as you can, at the boundary.

But I still think I see the reverse far more often. Like, `steam` is … all the time, apparently … exec'ing a shell to then exec … xdg-user-dir? (And the error seems to indicate that that's it…) Which seems more like the sort of "you could just exec this yourself?". (But Steam is also mostly a web-app of sorts, so, for all I know there's JS under there, and I think node is one of those "makes exec(2) hard/impossible" langs.)

import os import os.subprocess #is that right? subprocess.execute(f'tar cvzf t.tar.gz {' '.join(list_of_files)}')

Did I do that right?

or was it

`tar cvzf t.tar.gz *`

    import subprocess

    subprocess.run(['tar', 'cvzf', 't.tar.gz', *list_of_files])
or indeed

    import os, subprocess

    subprocess.run(['tar', 'cvzf', 't.tar.gz', *(f.path for f in os.scandir('.'))])
if you need files from the current directory