|
|
|
|
|
by cxr
1066 days ago
|
|
Nope. Try it out: $ dump ./scratch/p.html
3c 70 3e 20 20 0a 20 20 49 27 6d 20 61 20 74 65
< p > . I ' m a t e
78 74 20 6e 6f 64 65 20 5b 20 20 20 20 5d 20 20
x t n o d e [ ]
20 20 3c 2f 70 3e 0a
< / p > .
(I replaced your first square bracket sequence with two spaces followed by a newline (U+000A) followed by two more spaces, and I replaced the second square bracket sequence with a space followed by a literal left square bracket, followed by four spaces characters, followed by a literal right square bracket, followed by four more spaces.)The text node's value is exactly the sequence of characters between the closing angle bracket in `<p>` and the opening angle bracket in `</p>`: " \n I'm a text node [ ] "
> The browser is merely the interpreter of the language, but the HTML specification lays out how the language should be interpreted.You're right about the second half, but you're wrong in thinking that it says extra whitespace should be ignored. It doesn't. The bigger problem, though, is in the first half. I think you have an oversimplified understanding of what's going on in a browser and of the relationship that HTML has to what you see when the browser paints the content on the screen and lets you interact with it; a fundamental misunderstanding seems to exist on your part regarding the pipeline that you do or don't think of as existing between the markup and what you actually get when you open the page in a browser—there's a lot more to it than the browser being "merely the interpreter" for HTML. |
|
I’m going to have to look further into this to get a better understanding, but I suppose the rules for collapsing whitespace in a text node exist somewhere in the HTML specification, but not at the “interpretation” stage as I assumed.
To be clear what I imagined was that at the interpretation stage a text node would be marked to begin at the first non-whitespace character and end at the last non-whitespace character. And then within the text node there might be additional whitespace that would need to be collapsed into a single space.
Since the first type is not rendered at all and the second type is collapsed to a single space I assumed the rules could exist at two different points in the process/pipeline.
So what I gather here is that both types exist at a later stage than “interpretation” (basically what you see when you open Developer Tools and inspect individual nodes).
But I guess the subtlety here is that at whichever stage the whitespace collapsing/removal happens, the rules for it would still have to be defined by the HTML specification somehow.
And another subtlety to counteract that is that HTML is a markup language and not a programming language. One is executed, one is rendered. So any comparison between say Python and HTML needs to take that into account.
So even though there is some whitespace ignoring going on at some point from:
<p>[whitespace]This textnode has extraneous whitespace[whitespace]</p>
To the point where [whitespace] is not rendered in the viewport, the fact that the ignoring does not happen at the “interpretation” stage is important because that’s as far as the comparison between say Python and HTML can go before the two veer off in different directions.
I’m mainly typing this out for my own understanding, but again, will have to look into it myself to validate or correct my current framework of thinking about this. Thanks for an interesting discussion