| OK, Unix was probably available to Knuth, but the task given to Knuth was not to promote any already written programs! Had he done so it would be claimed that he missed to do what was requested of him to do. Even today, if you would get the exactly same task, with the goal to make the most efficient solution when you have to care about the limitations of hardware available to you and to produce the self contained program (e.g. because your algorithm should run with hundreds of billions of words of input) you'd still at the end probably produce something closer to what Knuth did then what McIlroy did. Which doesn't mean that it's not brilliant. But it's also not obvious, i.e. not something a "normal user" would "know": - even if you knew that "tr command translates the characters to characters" did you know that you could (and must) write tr -cs A-Za-z'
'
to perform the first operation from 6? What the -c does? What the -s does? That you could and even had to form the command line to contain the newline? I bet a lot of Unix users of today would still not know that one.- did you know what the fifth line was supposed to do "sort -rn" Would you know that you're to sort "numerically" (-n) and that it would "work"? - "sed ${1}q" how many people even today would know that one? And after all that, the first of two sorts needs to sort the file that is as big as the original input! If you have hundreds of gigabytes of input, you'd have to have at least that much more just to sort it. McIlroy's approach is a good one for one-off program or not too big input processing, and if you knew that you can use these commands as he used them. But it's still not "a program" in the same sense a Knuth's program is. Knuth's algorithm would, unsurprisingly, handle the huge inputs orders of magnitude more efficiently. And that is what McIlroy was aware of and intentionally hand-waved it in his "critique." Read the original text: https://www.cs.tufts.edu/~nr/cs257/archive/don-knuth/pearls-... But the major point is still: Knuth's task was not "use the existing programs" (or libraries) but "write a program" that does what's said to be done: The fair comparison would then include the source of all the sort, uniq, tr etc. programs which McIlroy used. And once that is being done, McIlroy's code would still be both less readable, less efficient and worse overall. Which on the other side also doesn't mean that for some purposes "worse" isn't "better": https://yosefk.com/blog/what-worse-is-better-vs-the-right-th... But for some purposes, on "better" works and "worse" simply doesn't, e.g. when the scale of the problem is big enough. And Knuth teaches us how to solve such, harder problems. And presents the complete solution v.s. doing tricks (just call that library/executable which I'm going to avoid to explain you how it is implemented and what its limitations really are). And giving the misunderstanding in the difference between showing how something is implemented (most efficiently) and that "just use pre-written tool X" approach, I understand even more why Knuth uses assembly in his "The Art of Computer Programming" books. |
> - even if you knew that "tr command translates... "sed ${1}q" how many people even today would know that one?
Are you suggesting it's ever been more likely for people to understand how to manage a trie structure in Pascal than use Unix command line tools? Or look flags up in the manpages?
Personally speaking, I'm comfortable doing both, but can't imagine many scenarios where I'd rather have ten pages of a custom datastructure than six lines of shell. (And they all involve either high volumes of data or situations where I can't easily get to the process-level tools.)
> The fair comparison would then include the source of all the sort, uniq, tr etc. programs which McIlroy used.
If you're including the code that provides the surface abstractions, where do you draw that line? If the code for sort, uniq, etc. is fair game, why not the code for the shell that provides the pipelining, the underlying OS, the file system, the firmware on the disk? After all, who's to say that the programs in the pipeline don't just run one after another with temporary files written to disk, rather than in parallel? (Which I've seen happen in reality.)
The same is true for the other side, of course. The 'fair comparison' could easily require Knuth's solution to include the source for weave/tangle, TeX/Metafont/CMR, the OS, etc.
> And once that is being done, McIlroy's code would still be both less readable, less efficient and worse overall.
What definition of 'worse' are you using?
* I expect sort/uniq/tr/sed to be more well tested and understood than a bespoke program.
* If there are issues with the program, it'll be easier to find skills/time to maintain a shell pipeline than custom trie-balancing code written in Pascal. (Sitting aside a prose description of the same.)
* The shell pipeline runs on a machine that can be found in a retail store today, rather than requiring an elaborate download/build process.
* It's possible that the custom solution runs faster, but not obvious without testing. (None of which is worthwhile outside of demonstrated need.)
Point being: it's very easy to find a definition of 'worse' that applies more to the custom solution than to the pipeline.