Hacker News new | ask | show | jobs
by mar77i 3171 days ago
Dude! You need help with using your tools' features to reduce your pipeline lengths.

- grep derp | sed whatever is the same as sed '/derp/ whatever' (you might be interested in sed -r for that matter)

- sed whatever | sed whatever\ else is the same as sed -e whatever;whatever\ else

This list is by no means complete but saves a lot of external processes already, and seriously: think about writing the whole thing in pure bash or awk. There might be just too little gain to justify including all these tools without composing the many features they provide.

2 comments

I disagree, unless it's slow enough to effect performance, I prefer to combine simple pipes.

The only time I have to use more complicated constructions is to get around the stupid problem that passing filenames with pipes is almost impossible to do safely.

I learned when writing bash scripts for cygwin under Windows that Windows doesn't support copy-on-write fork()'s. This means that any new process is extremely expensive to fork, because it copies the memory from the process it was forking from, even if it doesn't need it.

As a result, I did as much as I could using only bash internals, and very rarely did I use pipes, because they always fork()'d a bash subshell in addition to whatever the process was.

In about 7k lines of bash script written to completely automate the iterative development of two games with a shared game engine featuring dockerized server containers, I was able to avoid using sed in almost all of it. When I did use it (pretty sure only two places), it was basically for mass substitution of variables inside of text files, and the multi-line syntax worked very nicely for this, as I could form the sed line with a loop over all the variables I wanted to replace.

It turns out for most bash scripts, you can do the most common sed substitutions using bash's rich variable substitution expressions. For instance, you see in a lot of bash scripts calling commands like "dirname" and "basename" to get the directory and filename of paths. There's a much faster way to do this in bash:

  path=/tmp/my/path/to/stuff.txt
  dir="${path%/*}"
  file="${path##*/}"
  other="${path/stuff/other}"
That dir line means "delete the shortest string that matches /* from the end of the string". That file line means "delete the longest string matching */ from the start of the string." That other line means "replace the word stuff with the word other in this string".
there's always find -print0 and many core tools support -z (-0 for xargs). but you have a point of course, YMMV applies.
There's always some reason for not to write "the perfect" command. Either being in a hurry or you just don't care about it.

Considering the commented out valid code, it looks like he was saving time by just adding another "|" and check the output, instead of improving the existing command and check the output. Something that I do very often as well.

Glancing over this I thought it said: "Either being hungry or...". Nodding in agreement, I then realised my mistake.