Hacker News new | ask | show | jobs
by slowwriter 1066 days ago
I see. Don’t know if you’re still checking for replies on this thread. Livin’ up to my name. Thanks for taking the time to explain, though.

I’m going to have to look further into this to get a better understanding, but I suppose the rules for collapsing whitespace in a text node exist somewhere in the HTML specification, but not at the “interpretation” stage as I assumed.

To be clear what I imagined was that at the interpretation stage a text node would be marked to begin at the first non-whitespace character and end at the last non-whitespace character. And then within the text node there might be additional whitespace that would need to be collapsed into a single space.

Since the first type is not rendered at all and the second type is collapsed to a single space I assumed the rules could exist at two different points in the process/pipeline.

So what I gather here is that both types exist at a later stage than “interpretation” (basically what you see when you open Developer Tools and inspect individual nodes).

But I guess the subtlety here is that at whichever stage the whitespace collapsing/removal happens, the rules for it would still have to be defined by the HTML specification somehow.

And another subtlety to counteract that is that HTML is a markup language and not a programming language. One is executed, one is rendered. So any comparison between say Python and HTML needs to take that into account.

So even though there is some whitespace ignoring going on at some point from:

<p>[whitespace]This textnode has extraneous whitespace[whitespace]</p>

To the point where [whitespace] is not rendered in the viewport, the fact that the ignoring does not happen at the “interpretation” stage is important because that’s as far as the comparison between say Python and HTML can go before the two veer off in different directions.

I’m mainly typing this out for my own understanding, but again, will have to look into it myself to validate or correct my current framework of thinking about this. Thanks for an interesting discussion

1 comments

Pretty much. HTML parsing produces a content model, where the model's whitespace matches pretty faithfully what's in the source document. At some later point, that model is massaged into the thing that you see and interact with—but the model itself retains everything; this is like a filter, if it helps to think of it that way, or a projection of a complex (e.g. 3D object) onto a lesser substrate (e.g. 2D plane).

Offhand, and after a few glasses of wine, there are a couple points where the whitespace collapse will occur:

- at the display level—when it's time for the browser to actually put the thing on the screen—for CSS contexts where the white-space property is "normal" or something similar, at least, or

- at the interaction level, when something like text selection happens, and the browser computes essentially the equivalent of node.innerText (versus node.textContent; alternatively: node.nodeValue, in cases where the node in question is a text node)