| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vram22 2667 days ago

Well, I'm a few days late to write my final point, due to being busy with other work. I know you've probably left this thread by now, as I didn't see any replies to my other challenges to you (about your misconceptions, about your calling some of my points "wrong" without substantiating why, and about your outright waffling (using terms like "could", "should", "maybe", etc., that I referred to elsewhere in the thread), but as I said, I'm not just making the replies for you, but for others, and also because you made accusations against me, so as to vindicate myself (although I do not need to do it, and the choice to do it or not is solely mine - it's just that I choose to do so this time). So here goes - my last comment in this largely futile thread:

I said I would point out a "fundamental flaw" in your points. The flaw is this:

You thought (and said) that my shell solution was an awk solution. That is wrong. It is a shell solution (and not an awk solution) for multiple reasons, which any slightly-more-than-beginner-person to awk and shell, should have easily known, if they had their fundamentals clear, which implies that you do not. It is a shell solution because:

1) the entire script is a pipeline (which is obvious to see from the pipe signs used, if you knew your stuff and had paid attention to the code, before writing your first reply). awk does not have the pipeline operator (as meaning send the output of the previous command to the input of the next command). That itself should have told you that it is a shell script, not an awk script. There is an awk command embedded in the shell script, but that is very different from saying that it is an awk script.

2) You said elsewhere in this thread, in reply to kazinator:

>That I did not know - thank you. We here still think of awk as fundamentally different from other pipeline facilities such as tr, sed, sort, uniq, and so on, but I can see why it could, perhaps should, be though of as being "shell".

That statement of yours above is wrong on two counts:

a) awk is not fundamentally different from tr sed, sort, uniq, etc. It is a Unix command-line command like any other. The fact that it happens to be what you and some others may call a full programming language (not a well-defined term, anyway) does not make it any less of a command-line command. A tool can be both of those at the same time, and awk is. So is Perl. So is Python. So are many other languages. In fact as someone else said and I hinted at, sed may be a Turing-complete language. So does that suddenly make my script a sed script, just because I used sed in it? But I used awk in it too. So should it be called an awk-sed script? But I used sort too. So now should I call it an awk-sort-sed script? See what I am getting at? No, it should just be called a shell script, because that is what it is. The shell is a high level language that orchestrates other programs via its syntax and operators. (See below about the shell's operand being whole programs.) You claimed that the main work of my script (the word counting) was done in awk, and the results piped to other commands, therefore it was an awk solution. But it is the shell that is doing the piping, not awk! awk cannot do such orchestration, at least not easily, not without resort to the "system()" library function it has, but that is again implemented using the shell (and other stuff, like fork and exec system calls - I'm simplifying here).

All shell scripts can consist of any command or combination (not just pipelines [1]) of commands, irrespective of the type of the command, whether it is a programming language or not, what language it is written in, etc. In fact there is not even a requirement that the commands used in shell scripts should all be filters; that requirement is only for shell pipelines. [2]

[1] A shell script can: consist of just a sequence of (one or more) command(s), terminated either by semicolons or newlines, or both; consist of one or more pipelines only; consist of any combo of the preceding. And also other variations, including at least an ampersand (&) terminating the command (or pipeline), which makes the preceding command or pipeline run asynchronously from the rest of the overall command/pipeline/script, if any, i.e. in the "background", as we say in Unix.

[2] Here is a shell script that demonstrates many of the above points:

  # a_script.sh
  foo1 # run foo1
  foo2; foo3 # run foo2, then foo3
  foo4 & # run foo4 in the background
  foo5 > f1 #run foo5, redirect its stdout to f1
  foo6 < f1 | foo7 arg1 arg2 | foo8 arg3 arg4 arg5

Any of those foo* commands in the script could be any command at all, without any restrictions. Only the commands in the pipeline on the last line of the script, even need to obey the conventions of filters, that I described above. The commands on the preceding lines do not.

All this is part of the flexibility, beauty and power of the shell, whether used in scripts or on the command line. Which brings me to my next key point: there is essentially (almost) no difference between typing commands interactively at the shell prompt, and invoking the same commands from within shell scripts that are run by the shell. The exact same syntax with the exact same semantics can be used (for all practical purposes, maybe with a few exceptions, in both modes, interactive or script).

In fact, you can even type for and while loops [3] (including with redirection of their input and output) at the shell prompt. (You can even type if statements at the shell! Same for case statements.)I do it all the time, for throwaway "scripts" such as ones to monitor the execution of some processes, and so on. And many standard Unix books - like classic book, The Unix Programming Environment (UPE), that I mentioned in this thread - show that in examples.

Another thing that UPE says and shows is something to the effect that "the shell is a very high level language - its operands are whole programs (emphasis mine)". That is why we can do things like the example in [3] below, but first, another example:

  while : # : is a built-in that evaluates to True, 
  saving having to run the true command from disk each 
  time
  do
    ps -aef | grep foo
    sleep 10
  done

This is a script (but can equally well be typed directly at the shell prompt, for the reason I gave above) that monitors the execution of the foo command. Better versions of it, using while and until commands of the shell, are shown in the book, you can look them up. One version may start like:

[3]

  while ps -aef | grep foo
  do
     # something or just sleep a bit
  done

which shows the point about the shell's operands being whole programs - the "ps -aef | grep foo" part is used as an operand in the while condition - and it is a pipeline, bigger even than "a whole program"! This works because the exit code of the pipeline is the exit code of the last command in it, which is grep, so the while condition is true if grep finds a match of foo in the ps output.

b) You called tr, sed, sort, uniq and so on, "pipeline facilities". They are that, but are not just that. Before and apart from the fact of being "pipeline facilities" (which is anyway, a non-standard term you used, a better and more standard term would be just "Unix commands" or "filters" - filters is a standard term, for programs that read either filename arguments or their standard input, process the input in some way, and write the results to standard output, thereby enabling the whole Unix pipeline paradigm), they are also simply normal commands, or programs. Any of those commands can be used either standalone, or in a pipeline. In fact there are other ways of using them too, for example, you can invoke any of those commands (as well as any other executable) as a child process from some other program you write, in C, Python or other programming language. You are creating distinctions where none exist, for who knows what reason.