|
|
|
|
|
by CarolineW
2667 days ago
|
|
I read your entire post, and I didn't miss any of the points you make. I simply disagree with you. Quoting from your post: And here is my initial solution in UNIX shell:
# bentley_knuth.sh
# Usage:
# ./bentley_knuth.sh n file
# where "n" is the number of most frequent words
# you want to find in "file".
awk '
{
for (i = 1; i <= NF; i++)
word_freq[$i]++
}
END {
for (i in word_freq)
print i, word_freq[i]
}
' < $2 | sort -nr +1 | sed $1q
So you invoke awk, and then run the output of awk through sort and sed.You're doing all the word counting in awk. Yes, you're invoking awk from a shell script, but that's really not the same thing as "using shell." McIlroy’s solution is genuinely shell: tr -cs A-Za-z '
' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q
"awk" is generally accepted as a full programming language, whereas "tr", "sort", "uniq", and "sed" are command line utilities. I don't think "awk" classes as a command line utility, so I don't class your solution as "shell".Perhaps you don't agree, perhaps you think "awk" is a command line utility. If so, then we'll agree to disagree. |
|
Meanwhile, you might want to scrutinize your own reply (the one to which I am replying here) and think a bit more deeply about what might be wrong with it. And until my full reply to come later, here are a couple of hints:
Hint 1:
>"awk" is generally accepted as a full programming language, whereas "tr", "sort", "uniq", and "sed" are command line utilities.
- a tool can very much be both a full programming language as well as a command line utility at the same time. awk falls into that category [1], as do many other Unix commands. Who made up a rule that it cannot be both at the same time? You?
Hint 2:
Check out your line:
>So you invoke awk, and then run the output of awk through sort and sed.
and compare and contrast its meaning with the meaning of your few lines immediately below it, including the one that says "McIlroy’s solution is genuinely shell:". Try to see the similarity/difference/contradiction.
[1] Finally, read the book The Unix Programming Environment, a classic, by Kernighan and Pike (Unix pioneers). I cut my Unix teeth on it, years ago, although, of course, years do not mean I am right and you are wrong. Facts do. There are chapters in the book on awk and sed. And IIRC they come under the topic of filters (maybe even the chapter name is that) a.k.a. command-line utilities, although not every such utility needs to be, or is, a filter. I think you have some confusion about terms and their meanings, and/or are assigning your own meaning, even though you use words like "generally accepted".
Also skim this article (published by me, years ago, on IBM developerWorks) to get your fundamentals more clear:
Developing a Linux command-line utility:
https://jugad2.blogspot.com/2014/09/my-ibm-developerworks-ar...
Enough for today - will do follow-up comment as I said, if needed, in a day or two.