Hacker News new | ask | show | jobs
by acqq 2748 days ago
OK, Unix was probably available to Knuth, but the task given to Knuth was not to promote any already written programs! Had he done so it would be claimed that he missed to do what was requested of him to do.

Even today, if you would get the exactly same task, with the goal to make the most efficient solution when you have to care about the limitations of hardware available to you and to produce the self contained program (e.g. because your algorithm should run with hundreds of billions of words of input) you'd still at the end probably produce something closer to what Knuth did then what McIlroy did.

Which doesn't mean that it's not brilliant. But it's also not obvious, i.e. not something a "normal user" would "know":

- even if you knew that "tr command translates the characters to characters" did you know that you could (and must) write

   tr -cs A-Za-z'
   '
to perform the first operation from 6? What the -c does? What the -s does? That you could and even had to form the command line to contain the newline? I bet a lot of Unix users of today would still not know that one.

- did you know what the fifth line was supposed to do "sort -rn" Would you know that you're to sort "numerically" (-n) and that it would "work"?

- "sed ${1}q" how many people even today would know that one?

And after all that, the first of two sorts needs to sort the file that is as big as the original input! If you have hundreds of gigabytes of input, you'd have to have at least that much more just to sort it. McIlroy's approach is a good one for one-off program or not too big input processing, and if you knew that you can use these commands as he used them. But it's still not "a program" in the same sense a Knuth's program is.

Knuth's algorithm would, unsurprisingly, handle the huge inputs orders of magnitude more efficiently. And that is what McIlroy was aware of and intentionally hand-waved it in his "critique." Read the original text:

https://www.cs.tufts.edu/~nr/cs257/archive/don-knuth/pearls-...

But the major point is still: Knuth's task was not "use the existing programs" (or libraries) but "write a program" that does what's said to be done: The fair comparison would then include the source of all the sort, uniq, tr etc. programs which McIlroy used.

And once that is being done, McIlroy's code would still be both less readable, less efficient and worse overall.

Which on the other side also doesn't mean that for some purposes "worse" isn't "better": https://yosefk.com/blog/what-worse-is-better-vs-the-right-th... But for some purposes, on "better" works and "worse" simply doesn't, e.g. when the scale of the problem is big enough. And Knuth teaches us how to solve such, harder problems. And presents the complete solution v.s. doing tricks (just call that library/executable which I'm going to avoid to explain you how it is implemented and what its limitations really are).

And giving the misunderstanding in the difference between showing how something is implemented (most efficiently) and that "just use pre-written tool X" approach, I understand even more why Knuth uses assembly in his "The Art of Computer Programming" books.

1 comments

> But it's also not obvious, i.e. not something a "normal user" would "know":

> - even if you knew that "tr command translates... "sed ${1}q" how many people even today would know that one?

Are you suggesting it's ever been more likely for people to understand how to manage a trie structure in Pascal than use Unix command line tools? Or look flags up in the manpages?

Personally speaking, I'm comfortable doing both, but can't imagine many scenarios where I'd rather have ten pages of a custom datastructure than six lines of shell. (And they all involve either high volumes of data or situations where I can't easily get to the process-level tools.)

> The fair comparison would then include the source of all the sort, uniq, tr etc. programs which McIlroy used.

If you're including the code that provides the surface abstractions, where do you draw that line? If the code for sort, uniq, etc. is fair game, why not the code for the shell that provides the pipelining, the underlying OS, the file system, the firmware on the disk? After all, who's to say that the programs in the pipeline don't just run one after another with temporary files written to disk, rather than in parallel? (Which I've seen happen in reality.)

The same is true for the other side, of course. The 'fair comparison' could easily require Knuth's solution to include the source for weave/tangle, TeX/Metafont/CMR, the OS, etc.

> And once that is being done, McIlroy's code would still be both less readable, less efficient and worse overall.

What definition of 'worse' are you using?

* I expect sort/uniq/tr/sed to be more well tested and understood than a bespoke program.

* If there are issues with the program, it'll be easier to find skills/time to maintain a shell pipeline than custom trie-balancing code written in Pascal. (Sitting aside a prose description of the same.)

* The shell pipeline runs on a machine that can be found in a retail store today, rather than requiring an elaborate download/build process.

* It's possible that the custom solution runs faster, but not obvious without testing. (None of which is worthwhile outside of demonstrated need.)

Point being: it's very easy to find a definition of 'worse' that applies more to the custom solution than to the pipeline.

> The shell pipeline runs on a machine that can be found in a retail store today, rather than requiring an elaborate download/build process.

That argument points to the fact that your “view” of the whole topic changes the assumed definition of the problem that was given to Knuth to solve. Read once again the original text: he was supposed to illustrate how the “Literate programming” could be ised while writing a program which solves a given program. It was definitely not “write an example of calling the existing pre-written programs”.

And, of course, it was all in 1986, definitely not “to target the machine which can be found in the retail store in 2018.”

McIlroy already behaved as the goal had been different than it was.

> McIlroy already behaved as the goal had been different than it was.

How would you feel about McIlroy's solution if it was semantically exactly the same, but written in a literate approach? (Essentially a version of 'weave/tangle', but for shell scripts.)

How would you feel if somebody would present the “six steps clicks in Excel and SQL Server” that eventually produce the same result? The starting goal was simply not “show how to use and combine external programs.” Even if you need some kind of skill to combine them. It’s exactly the same kind of missed fullfilment of the given starting task.
The reason I asked about a literate programming version of the shell script is that it speaks directly to Knuth's original stated goal: "I’ll try to prove the merits of literate programming by finding the best possible solution to whatever problem you pose"

In the context of that requirement, the it's the use of literate programming that's more of a concern than the specific implementation. (Which is why I asked about a literate version of the shell pipeline.)

Earlier in the thread, you also mention this concern around data volumes:

> If you have hundreds of gigabytes of input, you'd have to have at least that much more just to sort it. McIlroy's approach is a good one for one-off program or not too big input processing,

There, your concern is not justified by the stated requirements of the problem: "I did impose an efficiency constraint: a user should be able to find the 100 most frequent words in a twenty-page technical paper"

I do think McIlroy failed to solve the problem of demonstrating the value of literate programming, but I'm not sympathetic to arguments that he should've used more complex algorithms or relied on less library code. This is particularly the case when the additional complexity is only relevant in cases that don't fall into the given requirements.

(A literate program that uses SQL server or Excel might be an interesting read....)

> The reason I asked about a literate programming version of the shell script is that it speaks directly to Knuth's original stated goal: "I’ll try to prove the merits of literate programming by finding the best possible solution to whatever problem you pose" In the context of that requirement, the it's the use of literate programming that's more of a concern than the specific implementation.

And McIlroy's "solution" is provably not the "best possible solution" if you are interested in the algorithms, algorithmic complexity, the resources used, you know, all the topics studied by people doing computer science. All these topics are still relevant today.

That is the domain that was of interest to both Bentley and Knuth, and McIlroy "sabotaged" the whole by presenting effectively only a list of calls to the stand-alone programs which he hasn't developed himself, which he avoided to present, and even without looking at them, just by analyzing what are the best possible implementations of these, every student of computer science can prove that McIlroy's solution is worse.

If you carefully read the original text (and if you understand the topics of computer science), you can recognize that McIlroy was aware of the algorithmic superiority of Knuth's solution.

Knuth definitely and provably won.