Hacker News new | ask | show | jobs
by atannen 2748 days ago
re: At the time, Unix wasn't even widely available outside Bell Labs and a few places; it definitely wasn't available to Knuth or most of the column's readers.

At that time, 1986, Unix was widely available - there were more than 100k Unix installations around the world by 1984. AT&T Unix Sys V and UCB 4.3bsd were available. Knuth was friendly with McIlroy, who was head of the Bell Labs computer science research group that begat Unix. Sun Microsystems was formed in 1982 - their boxes ran Unix, and Sun was a startup spun off from Stanford.

4 comments

Hmm interesting; I remember checking this for the time when TeX was written (1977, because of questions about building on top of troff) -- what I remember finding is that Unix wasn't widely available at colleges then. Perhaps things changed in the next 9 years. As far as I know, Knuth's access to computers at the time was still through Stanford's AI Lab (that's why the first version of TeX in 1977-1978 was written in SAIL; see also the comment for 14 Mar 1978 in http://texdoc.net/texmf-dist/doc/generic/knuth/errata/errorl...). Do you know if Unix was installed on Stanford lab computers by 1986? What was the distribution of these 100k Unix installations (academia/industry)?
>Sun Microsystems was formed in 1982 - their boxes ran Unix, and Sun was a startup spun off from Stanford.

Right. Even their name was derived from Stanford University Network:

https://en.wikipedia.org/wiki/Sun_Microsystems#History

OK, Unix was probably available to Knuth, but the task given to Knuth was not to promote any already written programs! Had he done so it would be claimed that he missed to do what was requested of him to do.

Even today, if you would get the exactly same task, with the goal to make the most efficient solution when you have to care about the limitations of hardware available to you and to produce the self contained program (e.g. because your algorithm should run with hundreds of billions of words of input) you'd still at the end probably produce something closer to what Knuth did then what McIlroy did.

Which doesn't mean that it's not brilliant. But it's also not obvious, i.e. not something a "normal user" would "know":

- even if you knew that "tr command translates the characters to characters" did you know that you could (and must) write

   tr -cs A-Za-z'
   '
to perform the first operation from 6? What the -c does? What the -s does? That you could and even had to form the command line to contain the newline? I bet a lot of Unix users of today would still not know that one.

- did you know what the fifth line was supposed to do "sort -rn" Would you know that you're to sort "numerically" (-n) and that it would "work"?

- "sed ${1}q" how many people even today would know that one?

And after all that, the first of two sorts needs to sort the file that is as big as the original input! If you have hundreds of gigabytes of input, you'd have to have at least that much more just to sort it. McIlroy's approach is a good one for one-off program or not too big input processing, and if you knew that you can use these commands as he used them. But it's still not "a program" in the same sense a Knuth's program is.

Knuth's algorithm would, unsurprisingly, handle the huge inputs orders of magnitude more efficiently. And that is what McIlroy was aware of and intentionally hand-waved it in his "critique." Read the original text:

https://www.cs.tufts.edu/~nr/cs257/archive/don-knuth/pearls-...

But the major point is still: Knuth's task was not "use the existing programs" (or libraries) but "write a program" that does what's said to be done: The fair comparison would then include the source of all the sort, uniq, tr etc. programs which McIlroy used.

And once that is being done, McIlroy's code would still be both less readable, less efficient and worse overall.

Which on the other side also doesn't mean that for some purposes "worse" isn't "better": https://yosefk.com/blog/what-worse-is-better-vs-the-right-th... But for some purposes, on "better" works and "worse" simply doesn't, e.g. when the scale of the problem is big enough. And Knuth teaches us how to solve such, harder problems. And presents the complete solution v.s. doing tricks (just call that library/executable which I'm going to avoid to explain you how it is implemented and what its limitations really are).

And giving the misunderstanding in the difference between showing how something is implemented (most efficiently) and that "just use pre-written tool X" approach, I understand even more why Knuth uses assembly in his "The Art of Computer Programming" books.

> But it's also not obvious, i.e. not something a "normal user" would "know":

> - even if you knew that "tr command translates... "sed ${1}q" how many people even today would know that one?

Are you suggesting it's ever been more likely for people to understand how to manage a trie structure in Pascal than use Unix command line tools? Or look flags up in the manpages?

Personally speaking, I'm comfortable doing both, but can't imagine many scenarios where I'd rather have ten pages of a custom datastructure than six lines of shell. (And they all involve either high volumes of data or situations where I can't easily get to the process-level tools.)

> The fair comparison would then include the source of all the sort, uniq, tr etc. programs which McIlroy used.

If you're including the code that provides the surface abstractions, where do you draw that line? If the code for sort, uniq, etc. is fair game, why not the code for the shell that provides the pipelining, the underlying OS, the file system, the firmware on the disk? After all, who's to say that the programs in the pipeline don't just run one after another with temporary files written to disk, rather than in parallel? (Which I've seen happen in reality.)

The same is true for the other side, of course. The 'fair comparison' could easily require Knuth's solution to include the source for weave/tangle, TeX/Metafont/CMR, the OS, etc.

> And once that is being done, McIlroy's code would still be both less readable, less efficient and worse overall.

What definition of 'worse' are you using?

* I expect sort/uniq/tr/sed to be more well tested and understood than a bespoke program.

* If there are issues with the program, it'll be easier to find skills/time to maintain a shell pipeline than custom trie-balancing code written in Pascal. (Sitting aside a prose description of the same.)

* The shell pipeline runs on a machine that can be found in a retail store today, rather than requiring an elaborate download/build process.

* It's possible that the custom solution runs faster, but not obvious without testing. (None of which is worthwhile outside of demonstrated need.)

Point being: it's very easy to find a definition of 'worse' that applies more to the custom solution than to the pipeline.

> The shell pipeline runs on a machine that can be found in a retail store today, rather than requiring an elaborate download/build process.

That argument points to the fact that your “view” of the whole topic changes the assumed definition of the problem that was given to Knuth to solve. Read once again the original text: he was supposed to illustrate how the “Literate programming” could be ised while writing a program which solves a given program. It was definitely not “write an example of calling the existing pre-written programs”.

And, of course, it was all in 1986, definitely not “to target the machine which can be found in the retail store in 2018.”

McIlroy already behaved as the goal had been different than it was.

> McIlroy already behaved as the goal had been different than it was.

How would you feel about McIlroy's solution if it was semantically exactly the same, but written in a literate approach? (Essentially a version of 'weave/tangle', but for shell scripts.)

How would you feel if somebody would present the “six steps clicks in Excel and SQL Server” that eventually produce the same result? The starting goal was simply not “show how to use and combine external programs.” Even if you need some kind of skill to combine them. It’s exactly the same kind of missed fullfilment of the given starting task.
The reason I asked about a literate programming version of the shell script is that it speaks directly to Knuth's original stated goal: "I’ll try to prove the merits of literate programming by finding the best possible solution to whatever problem you pose"

In the context of that requirement, the it's the use of literate programming that's more of a concern than the specific implementation. (Which is why I asked about a literate version of the shell pipeline.)

Earlier in the thread, you also mention this concern around data volumes:

> If you have hundreds of gigabytes of input, you'd have to have at least that much more just to sort it. McIlroy's approach is a good one for one-off program or not too big input processing,

There, your concern is not justified by the stated requirements of the problem: "I did impose an efficiency constraint: a user should be able to find the 100 most frequent words in a twenty-page technical paper"

I do think McIlroy failed to solve the problem of demonstrating the value of literate programming, but I'm not sympathetic to arguments that he should've used more complex algorithms or relied on less library code. This is particularly the case when the additional complexity is only relevant in cases that don't fall into the given requirements.

(A literate program that uses SQL server or Excel might be an interesting read....)

Absolutely agree - Unix was ubiquitous in academia by 1986. I myself learned C on a VAX running BSD 4.2 in 1986.