Hacker News new | ask | show | jobs
by thr-nrg 1124 days ago
>* The "includes" problem. Most language require you to include/import/require packages used in a file. These are usually all placed at the start. This means a lot of chapters will start with "here are all the includes we'll need" if you're using a linear format. More complex formats that rearrange the code to generate outputs can do a better job but it's still a little clunky.

Real literate programming, rather than rich text comments, can order the code blocks in any order. You can add then at the very end of the book/chapter/section if you feel like it.

I did some literary programs about a previous generation of our program. Every new hire reads them and asks me for copies.

1 comments

> Real literate programming, rather than rich text comments, can order the code blocks in any order. You can add then at the very end of the book/chapter/section if you feel like it.

And yet Knuth still does it the other way. <http://akkartik.name/post/literate-programming>

Does it matter though? The primary purpose of using literate programming is expository. If Knuth feels that putting the includes at the start makes sense, then that's what he does. If you don't, then fine you move it to the end or the middle. That's the benefit of WEB (and WEB-derived systems). The presentation order is not dependent on the code order, it's dependent on what makes the most expository sense.

When I've written literate programs, I almost always shuffle long lists of includes to the end. They add little or no value at the top and distract from the material I want to present. It's also a simple cut and paste to move them back to the top if I wanted to. The only time I leave them at the top is if there's something actually informative about having them at the top, or it's a short program, or I've not actually finished working on it (a lot of my literate programs start as traditional "live in source files" programs that I slurp into org files).

> Does it matter though?

Yes.

> The primary purpose of using literate programming is expository.

Right. A big, zero-context block of includes prefaced with a comment that you should just "skip ahead" past it to the "interesting stuff" and offering no insight about the stuff that you're skipping over is the opposite of expository. The main program—what it actually does—is sufficient to serve as exposition for the includes that it ends up needing to be put it. So actually write it that way.

> That's the benefit of WEB[...] The presentation order is not dependent on the code order, it's dependent on what makes the most expository sense.

The argument is that it doesn't make expository sense to write the includes the way Knuth is. It's just yet another form, as taeric observes, of boilerplate—which makes us a slave to what the (LP-unaware) compiler expects, and which LP is supposed to liberating us from...

* * * *

It strikes me that there are probably only two ways to really deal with includes in the spirit of literate programming, which is to either (a) put them at the end after having already shown us why they're necessary (accompanied with with e.g. a comment along the lines of "Recall that our program is using printf—a part of the C standard library and defined in stdio.h—so it's necessary, therefore, that we include it in order for our program to compile") or (b) to use WEB/CWEB's appending facilities—after having written a routine that uses printf for the first time, you write an interlude not altogether different from what I just described that adds that include to the "running sum" of necessary includes, which the tangle step will take care of as part of the build.

(I say "only" because these are the only ways that occur to me that it can reasonably be done. I'm not committed to that being true—I'm open to the possibility of there being more—and it's not as if I started with that in mind, but it's hard for to imagine others. The only thing I'm really committed to is that what the article says about Knuth's two examples being wrong is accurate. I know Kartik has softened his stance since originally writing it, but I haven't. The examples given are individually each an awful way to demonstrate LP considering how antithetical that is to the whole thing. Totally indefensible.)

My point, which you missed: It doesn't matter what Knuth does.

thr-nrg rightly pointed out in their comment that you can place the code wherever you want. You then say, "Knuth puts it at the top" (meant as a criticism of his style).

My point (so you don't miss it again): You can place the code wherever you want when doing a literate programming style. It also doesn't matter what Knuth does with his includes because his style does not dictate your style or my style or anyone else's style. So put it wherever you want. His style does not matter.

I didn't miss it. I replied to your question of whether it matters: yes, it does.

Your point doesn't make any sense. ("_You_ can place the code wherever _you_ want when doing a literate programming style.") Literate programs are supposed to be read. Of course a writer can write however they want. You could write a book where everything that isn't a proper noun ends with an uppercase letter, except for proper nouns, which are all lowercase. It will be shit.

What matters is the exposition. Does the presentation (order, style, etc.) communicate what you intend for it to communicate, and does it communicate effectively. That's what matters.

Does it matter what Knuth does? Nope. I don't care for that aspect of his style. I already said I put the includes (when a text gets past the draft stage at least) toward the end. We're in violent agreement with respect to how we (if I understood you correctly) often or generally put them into LP texts.

You're just weirdly caught up on criticizing Knuth's style for some reason. Have fun shouting into the wind. The rest of us know, it doesn't matter. The goal is to communicate effectively and not whine about things that don't matter.

I'm not entirely sure that trying to not have boilerplate of any kind is a worthy goal. Consider, no book is wanting to get rid of the title page. Even a dedication page has grown to be a near required page that we literally teach students to write.

To that end, the goal is to not have parts that can be the same between programs vary. That has pedagogical value. Is a type of stability that we don't really value much anymore.

Though, I also give a +1 to jtsummers' point. Knuth has a style of writing literate programs. And if you find some old videos of his reviewing student's code, he explicitly calls out points of style that he likes from different students and how they work together. He didn't necessarily want to push a "this is how you style your programs," but instead was working how you talk about programs into a way that you can also write them. While not having to relearn either, necessarily.

> I'm not entirely sure that trying to not have boilerplate of any kind is a worthy goal.

I think it's a worthy goal for nearly all readers. However I also believe that literate programmers should be written with many different readers in mind.

So you could have one version that is written in a "typical code" style like:

    * library 1
    ** module 1
    *** source file 1
    ** module 2
    *** source file 2
That doesn't need to be where it ends though and arguably shouldn't be. Write multiple books using the same code but represented differently as it's justified.

For example:

    * typical program style
    ** library 1
    *** module 1
    **** source file 1
    *** module 2
    **** source file 2

    * new to language X presentation of program

    ** link to various parts of library 1 alongside prose to describe concepts to language newcomer

    * typical hire presentation of program

    ** business domain focused exposition presenting code via transclusion as needed from typical program style
People will inevitably complain that what you're advocating for involves too much duplication, but I've independently reached the same conclusion that you have. Besides, in a good LP system, the toolchain would be able to consume the second part and transform and emit the first part.

See also this conversation about the unreadability of Knuth's .web files: <https://news.ycombinator.com/item?id=29203798#29205840>

> Consider, no book is wanting to get rid of the title page.

Lists of imports are not title pages.

> the goal is to not have parts that can be the same between programs vary

Lists of imports vary. (And in today's world of extreme software reuse, which other pieces of software a program depends on is of greater interest than ever.)

I'd take your point on boilerplate if it were necessary and unavoidable. I've already described, however, that it isn't and how.

Depends on the list? The ones you linked from Knuth are basically fixed imports that are, in fact, very common and easy to take as a whole.

I think the disconnect is still that you are writing a C program with CWEB. If I'm using a literate style to write JavaScript, I'm still writing a JavaScript program. As such, the point isn't to just throw any established norms out, but to rearrange for presentation. If some items go well together, might as well keep them together.

Most programming tools that can do this require you to "bundle" things such that you have to have well formed containers, if you will. Having the "add this to a section" constructs really help, as then you can say things like "add this to the imports" the first time you have need of a new import. But, if you don't add any non-standard imports, you can forego doing that.

The other approach is to try and make everything implied, such that new users can skip any of the boilerplate and jump straight to programming. That is fine, for what it is, but does little to build understanding of the programs as a whole.

That said, I may have missed your described solution. I'm kind of scattershot today and not keeping fully on top of this. Apologies if I'm talking past you.

I missed this conversation earlier, but see my response to that post here: https://news.ycombinator.com/item?id=29871047 (also see https://news.ycombinator.com/item?id=30762055) — in short, literate programming is writing; writing is done with an audience in mind; you don't necessarily have to explain everything; Knuth only chooses to explain the tricky stuff rather than language features or obvious (to him) #includes, especially in these throwaway programs that he wrote for himself.

Note the end of the sentence you quoted: “if you feel like it” — Knuth clearly doesn't feel like it in these programs. (BTW, it appears he generally adds a comment next to each #include mentioning why it's being included.)

Literate programming, for Knuth, is supposed to be a fun way to write programs that leads to them being easier to write and read and debug and maintain; it's not some sort of purity discipline that requires everything to be maximally expository.

> Now, writing is always (best) done with a specific reader in mind: you assume the reader has a certain background/prerequisites: some things that don't need explaining, and some things that do.

I don't buy it. The same explanation can be used to excuse the use of traditional (non-LP) programming systems—raising questions about why make a fuss about LP at all at that point if an LP text is going to treat the same types of shortcomings as a given.

As the commenters on Ward's wiki pointed out, Knuth's examples come across as still being written for the compiler—what he's mostly succeeded at is just coming up for a different syntax for it to consume in a roundabout way.

> Depending on the reader you're targeting (e.g., yourself a few years from now), and how polished you're trying to make your presentation, you may well choose to take it for granted and not bother explaining that a C program will have some obvious #includes at the top.

But of all the things that Knuth could explain, that's the one thing he does explain. It's not a lack of an explanation that the program will have some includes that is the problem. It's the impertinence of immediately dumping a list of includes on the reader, in a total failure in exposition.

We don't have to focus exclusively on the includes to see this problem. The define at the top of the Symmetric Hamiltonian cycles exhibits the same thing just as clearly.

I have another theory that I've floated, which is that Knuth realized that there's something wrong with how traditional Pascal and C compilers force you to write/read your programs bottom-up, but by the time he came up with literate programming he was already so warped and tainted from years of doing work in the bottom-up tradition to satisfy the compiler that it ends up clouding his vision, even when he knows that's the mindset he's trying to actively work against.

I was thinking of this again this weekend, after spending a couple of hours reading some literate programs again, in light of Knuth's recent (via https://twitter.com/ksmeel/status/1661890306370072576) note https://cs.stanford.edu/~knuth/papers/cvm-note.pdf and associated program https://shreevatsa.github.io/knuth-literate-programs/program..., a CWEB program he wrote a few days ago. It's been 9–10 days here and I'm not sure anyone will read this (I'm surprised HN still lets me reply), but anyway…

It seems that your main complaint is that in Knuth's literate programs, he does not explain enough. To which my answer is: Sure! LP doesn't mean you have to explain everything, you can choose to explain as much you want. It's not a "moralism" like "structured programming"; it's just a tool.

I think everything you're saying is along the same lines:

> The same explanation can be used to excuse the use of traditional (non-LP) programming systems—raising questions about why make a fuss about LP at all at that point if an LP text is going to treat the same types of shortcomings as a given.

Firstly, "excuse" seems to suggest an accusation that something is wrong with non-literate programming, which has not been made except as a joke — LP is just presented as a tool, with the expectation that it won't work for everyone. (Knuth is against moralism in programming style, and has often complained about it, e.g. in the context of defending GOTO and comparing pointers and what not.) So your question is basically asking: why attempt to explain at all, if one is not going to explain everything? The answer is, even without explaining everything, explaining as much as he does seems to work for him; he's been doing this for 40 years now and continues to rave about it.

> But of all the things that Knuth could explain, that's the one thing he does explain.

(I didn't understand this part.)

> immediately dumping a list of includes on the reader

As a rule of thumb, Knuth has said somewhere (maybe in the early days of LP, for TeX), that he targeted around 12 lines per LP section. So I would imagine that if he ever wrote a program that had more than about 12 includes (which is pretty hard to imagine :-)), he would split up the list of includes into multiple sections, presented separately. Below that (like the four or five includes in these examples), there's not much value in splitting it up further. I guess there's a lesson/hint here that splitting something up into sections is not "free" and below some threshold starts having more cost than benefit. (Just like the cost with having lots of short functions in modern-day non-LP programs: Ousterhout's A Philosophy of Software Design has some succinct words about that: https://web.stanford.edu/~ouster/cgi-bin/aposd2ndEdExtract.p....)

In any case, not having separate sections for #includes is not a failure of imagination: if you look at the programs on his webpage, the very first example ("used as a handout for a lecture on literate programming") has #includes separated out: https://shreevatsa.github.io/knuth-literate-programs/program... — note that in section 2, after a self-mocking joke ("First we brush up our Shakespeare by quoting from The Merchant of Venice … This makes the program literate."), he uses printf, and in the next section includes stdio.h ("Since we’re using the printf routine"), and then in section 7 he has another include, with the words "UNIX’s localtime function does most of the work for us, but we need to include another system header file before we can use it." So the fact that he presented includes like this in his early demonstration program (October 1992), and does not bother to do so in later programs (including last week: May 2023) seems worth thinking about.

> The define at the top of the Symmetric Hamiltonian cycles exhibits the same thing just as clearly.

As mentioned before, this is one of the programs that "use the conventions and library of The Stanford GraphBase". If you look at the program (https://shreevatsa.github.io/knuth-literate-programs/program...), it starts with "We use a utility field to record the vertex degrees" and then has "#define deg u.I". This is in fact part of those conventions — in the SGB book index, there's an entry for "utility fields" pointing to pages 38–39 and 284, where this is explained:

> Every Vertex record contains eight subfields. We have already mentioned name and arcs; the other six subfields are called utility fields because they can be used for many different purposes. […] The six utility fields are named u, v, w, x, y, z, and their five possible interpretations are distinguished by adding one of the respective suffixes .I, .S, .G, .V, .A; thus, for example, v->w.I stands for the integer in utility field w of the Vertex record pointed to by v.

> Utility field names are usually given meaningful aliases by means of macro definitions. For example, GB-GAMES defines nickname to mean y.S.

and so on, and the book is full of definitions like "#define source a.V" and "#define back_arcs z.A" — so when the SHAM program begins with "we use a utility field" and then "#define deg u.I", this is established convention, not some wild thing pulled out of nowhere. When Kartik says “presumably a struct whose definition — whose very type name — we haven't even seen yet. (The variable name for the struct is also hard-coded in”, all of these presumptions are wrong: `u` is not a hard-coded variable name but the name of a field in Vertex (over the course of the program we can see v->deg, x->deg, u->deg, a->tip->deg: there's an index entry for "deg"), and the Vertex struct's definition and type name are well-documented (in a published book). I find it easy to take at face value that this was really the order in which Knuth thought of things, and also the level of exposition that he finds most useful (for his intended audience, which is himself).

This points at a problem with literate programming, that I also mentioned in the other thread: because it can be so personal, everyone who uses LP has their own idea of what is most worth explaining, and even ends up building their own LP tools. (Similar things are said about every Lisp programmer ending up with their own idioms and mini-languages.)

> he was already so warped and tainted from years of doing work in the bottom-up tradition to satisfy the compiler that it ends up clouding his vision

Another way of saying this is that Knuth never believes in hiding the fact that you're writing code that a compiler will read and that a computer will execute — in fact he continues to annotate variables with "register" even though compilers ignore it, simply because he likes to be conscious about what instructions the (ideal) machine will execute — he's not a big believer in abstraction and hiding the details by passing to a higher level; he believes in (and somehow manages) constantly being aware of all levels at once.

Yet another way of saying this (my theory) is that when machine code / assembly programs became too hard to maintain, the rest of the world solved the problem by, over time, settling on abstraction, interfaces, higher-level languages, style conventions, information-hiding, and all that. Instead, Knuth has forged his own programming path/style, still basically writing machine-sympathetic programs, but "explaining to human beings what we want a computer to do". It is up to others to merge LP's human-orientation with mainstream ideas… if it will ever happen.

Anyway, it seems that the criticisms we're discussing are mostly based on first impressions. While they are valuable (and indicative of what others' first impressions will be), ultimately I think criticisms based on studying these programs more closely would be more interesting. E.g. has his LP style evolved over time? What can we learn from studying recent programs (there are over a dozen since 2020 alone, like the program above or others posted on his webpage and which I just typeset yesterday at https://shreevatsa.github.io/knuth-literate-programs/program...) — what can we learn from what he chooses to explain and not; why does he make those choices; what way does this seem to help him: can we use these insights (not the same style) for our own programming practice? Things like that.

> It's been 9–10 days here and I'm not sure anyone will read this (I'm surprised HN still lets me reply)

HN gives you 14 days.

(I wish it had a feature to "mark" a comment in lieu of writing a reply then and there. When 12 or 13 days have passed, it'd send you an email reminding you that this is your last chance to add what you have to say to the record. In this way, it would favor people going away and coming back to leave more thoughtful comments instead of impelling them to reply instantaneously i.e. to avoid the risk of not being heard.)

I suppose the hack would be to set your delay in your profile to 60 × 24 × 12.5 = 18000 (minutes), post your reply, then reset it to its initial value. You'd progressively edit your comment as the opportunity comes up, and then it will be autoposted in whatever form its in when the delay elapses. You can only edit posts for up to an hour (or something), though, and I'm not sure whether that timer starts from the time you hit submit or when it appears in the comments (i.e. after the delay has elapsed).

I think the idea of a general preamble that would have the same explanation, every time, is fine? This is literally the boilerplate concept.

The idea is more to break the code into parts you would explain and grok easily. Not every atomic part of the code.

See my response to Jtsummers. <https://news.ycombinator.com/item?id=35989931>
And? Knuth also used Pascal for the original Web and TeX. Both are still Pascal programs that get transpiled to C before being compiled. Just because he does something doesn't mean you should copy it religiously.

Web and its derivatives are sufficiently advanced that they suffer from the lisp curse. What in other systems are major fundamental engineering problems - try adding an include in the middle of a c file - in Web derivates are a matter of taste. Do you have a chunk that picks up includes as you need them? Is it a big one at the front or back? It's up to you.

> And?

And nothing. For programs that are ostensibly meant to be read, Knuth's examples are poor ones.

> in Web derivates are a matter of taste

Dumping a bunch of includes at the top and saying the equivalent of "don't worry about this boring stuff for now; you can skip it if you want" (and thereby compromising the entire LP experiment) is a matter of taste in the way that putting beef in a vegan casserole is.