Hacker News new | ask | show | jobs
by CarolineW 2667 days ago
I read your entire post, and I didn't miss any of the points you make. I simply disagree with you.

Quoting from your post:

    And here is my initial solution in UNIX shell:

    # bentley_knuth.sh

    # Usage:
    # ./bentley_knuth.sh n file
    # where "n" is the number of most frequent words 
    # you want to find in "file".
    
    awk '
        {
            for (i = 1; i <= NF; i++)
                word_freq[$i]++
        }
    END     {
                for (i in word_freq)
                    print i, word_freq[i]
            }
    ' < $2 | sort -nr +1 | sed $1q
So you invoke awk, and then run the output of awk through sort and sed.

You're doing all the word counting in awk.

Yes, you're invoking awk from a shell script, but that's really not the same thing as "using shell." McIlroy’s solution is genuinely shell:

    tr -cs A-Za-z '
    ' |
    tr A-Z a-z |
    sort |
    uniq -c |
    sort -rn |
    sed ${1}q
"awk" is generally accepted as a full programming language, whereas "tr", "sort", "uniq", and "sed" are command line utilities. I don't think "awk" classes as a command line utility, so I don't class your solution as "shell".

Perhaps you don't agree, perhaps you think "awk" is a command line utility. If so, then we'll agree to disagree.

1 comments

Wow. Multiple misunderstandings on your part in one single fairly short comment. I'll of course reply to it, substantiating what I said, as best as I can, but it's late here, and when replying to an argument, I prefer to do it thoroughly enough, so I'll do it, hopefully, by tomorrow night my time, otherwise a day later, if too busy. I think the reply link should be alive until then.

Meanwhile, you might want to scrutinize your own reply (the one to which I am replying here) and think a bit more deeply about what might be wrong with it. And until my full reply to come later, here are a couple of hints:

Hint 1:

>"awk" is generally accepted as a full programming language, whereas "tr", "sort", "uniq", and "sed" are command line utilities.

- a tool can very much be both a full programming language as well as a command line utility at the same time. awk falls into that category [1], as do many other Unix commands. Who made up a rule that it cannot be both at the same time? You?

Hint 2:

Check out your line:

>So you invoke awk, and then run the output of awk through sort and sed.

and compare and contrast its meaning with the meaning of your few lines immediately below it, including the one that says "McIlroy’s solution is genuinely shell:". Try to see the similarity/difference/contradiction.

[1] Finally, read the book The Unix Programming Environment, a classic, by Kernighan and Pike (Unix pioneers). I cut my Unix teeth on it, years ago, although, of course, years do not mean I am right and you are wrong. Facts do. There are chapters in the book on awk and sed. And IIRC they come under the topic of filters (maybe even the chapter name is that) a.k.a. command-line utilities, although not every such utility needs to be, or is, a filter. I think you have some confusion about terms and their meanings, and/or are assigning your own meaning, even though you use words like "generally accepted".

Also skim this article (published by me, years ago, on IBM developerWorks) to get your fundamentals more clear:

Developing a Linux command-line utility:

https://jugad2.blogspot.com/2014/09/my-ibm-developerworks-ar...

Enough for today - will do follow-up comment as I said, if needed, in a day or two.

Then we'll agree to disagree. I think you are wrong on so many points here, it's clear we're not going to agree, and probably won't find common ground.

Thank you, by the way, for your references to various published material. FWIW, I've worked with BCPL, C, AWK, C++, Unix, Linux, GNU, and much, much more, for the last four decades or so, so I'm not inexperienced, and I have read most of the classics. That also doesn't mean I'm right, but it does mean that I have a basis for my opinions.

So thank you for your offer to school me, but I'll decline, and, as I say, accept that we disagree.

>Then we'll agree to disagree. I think you are wrong on so many points here

Thanks for casting aspersions without even so much as a mention of what the "so many points" are that I am supposedly wrong on.

When I said upthread that you have misunderstandings, I at least mentioned some and hinted at or gave a clue to what the others were.

Also interesting that when kazinator said to you that awk is part of shell, you meekly accepted that he was right, thereby contradicting your earlier claim that my shell solution was not a shell but an awk solution. And in that same comment ( https://news.ycombinator.com/item?id=19279030 ) accepting it, you still seem to be neither here nor there, by your own words, where you say things like you "see why it could, perhaps should, be though (sic) of as being "shell", but "always have trouble thinking of it as such".

Just happened to see your reply here before I went off to sleep:

https://news.ycombinator.com/item?id=19276012

Interesting and maybe significant that you say: "Then we'll agree to disagree. I think you are wrong on so many points here, it's clear we're not going to agree, and probably won't find common ground."

First, interesting that you say "I think you are wrong on so many points here ..." but do not deign to offer any points to back up your statement. Kind of a cop-out, looks like. Anyone can say someone else is wrong; such statements do not carry any weight unless backed up with something more substantial.

And about your "four decades", like I said in a previous comment, years or age do not matter, facts do. I care not a whit if the person I am arguing with has 4 years or 4 decades or 4 centuries of experience. They (or I) can still be wrong (or right) about any specific topic we happen to be arguing about. I've been known to acknowledge that I was wrong, in arguments with people less experienced than me, many times, and vice versa has happened too.

Nor is finding "common ground" the goal (this is not some sort of compromise between political parties, it's a technical argument). Getting things right is the goal. For which, sometimes one party or the other may have to admit they are wrong - including me. Just that I do not think I am wrong in this case.

Will still write my fuller reply as I said earlier, to keep my word, and to make the picture more clear for other readers, since you have made these statements, even if you have hastily left the conversation.

OK, so as kazinator has pointed out, awk is now a mandatory part of Posix, and so is genuinely a part of "shell". My reply there says that I and my colleagues still think of awk as fundamentally different from other pipeline facilities such as tr, sed, sort, uniq, and so on, but I can see why it could, perhaps should, be though of as being "shell".

So it's shell. I might, however, given my background, and remembering as I do its first introduction, always have trouble thinking of it as such.

It's not a cop-out, we disagree.

> Nor is finding "common ground" the goal (this is not some sort of compromise between political parties, it's a technical argument).

We disagree. When there is a disagreement, finding what you agree with the the first step in finding where the lines of reasoning diverge. Finding common ground is the first step in resolving differences.

> Getting things right is the goal.

Sometimes in software there are judgement calls. Maybe this is one of them, maybe our definitions differ. Sometimes definitions differ because of context or experience. In each case, the terms used are not right or wrong, they are definitions that are useful in the context.

> For which, sometimes one party or the other may have to admit they are wrong - including me.

This is not an "I'm right, you're wrong" situation. By my experience, in my context, what you wrote would be called a "shell solution" in the same sense as the original command-line solution would be called a "shell solution."

You think that invoking AWK from the command line means that it's still a command-line script. Your definition of the terms means that you accept that invoking AWK still lets you call it a "shell solution."

I think that is fundamentally and structurally different from using command line utilities such as tr, sed, sort, and uniq.

So my position is clear - your solution that you call "shell" is not, in my opinion, just "shell". To me, your solution is an AWK solution, and you feed the output from your AWK program through shell utilities.

You are using the terms in a manner that is different from how I'm using them, that much is now clear.

Do you agree that you have written a shell script that invokes a program written in AWK?

Would it be different if you wrote a shell script that invoked a C program by calling a C interpreter? Would you still call it a "shell solution to the problem?"

Does it matter? Really? I've made clear why I've said that I don't class your solution as being shell, why do you care?