Hacker News new | ask | show | jobs
by myfonj 1467 days ago
Well this sounds like really interesting observation. May I ask where exactly were the original closing tags located and how the stripped source looked like? I can imagine there _might_ be some differences among differently formatted code: e.g. I'd expect

    <p>Content<p>Content[EOF fig1]
to be (slightly) slower, than

    <p>Content</p><p>Content</p>[EOF fig2]
(most likely because of some "backtracking" when hitting `<p[>]`), or

    <p>Content</p>
    <p>Content</p>[EOF fig3]
(with that that small insignificant `\n` text node between paragraph nodes), what should be possibly faster than "the worst scenarios":

    <p>Content
    <p>Content[EOF fig4a]
or even

    <p>
    Content
    <p>
    Content
    [EOF fig4b]
with paragraph text nodes `["Content\n","Content]"` / `["\nContent\n","\nContent\n]"`, where the "\n" must be also preserved in the DOM but due white-space collapsing rules not present in the render tree (if not overridden by some non-default CSS) but still with backtracking, that

    <p>Content
    </p>
    <p>Content
    </p>[EOF fig5]
should eliminate (again, similarly to fig2 vs fig1).

(Sorry for wildly biased guesswork, worthless without measurements.)

1 comments

It was just paragraphs of text. p, strong, em, and q mingled at most. No figures or images or anything of the like to radically shift DOM computations. That the effect can even be seen is probably due to the scale of the document, as I noted it's a little larger than most things.

All paragraphs had a blank line between them, both with and without the p end tag. The p opening tag was always at the top-left, with no gap between it and the content.

So, for example:

    <p>Cheats open the doorway for casual play. They make it easier for disabled players to enjoy the same things as their peers, and allow people to skip parts of a game that <em>they bought</em> that they find too difficult.</p>

    <p>Unfortunately, cheats are going away, because of extensive online play, and a more corporate approach to developing games that despises anything hidden.</p>
Versus:

    <p>Cheats open the doorway for casual play. They make it easier for disabled players to enjoy the same things as their peers, and allow people to skip parts of a game that <em>they bought</em> that they find too difficult.

    <p>Unfortunately, cheats are going away, because of extensive online play, and a more corporate approach to developing games that despises anything hidden.
(You can also discount CSS from having a major effect. Less than a hundred lines of styles, where most rules are no more complicated than: `p { font-family: sans-serif; }`. No whitespace rules.)

However, if you wanted to look at this in a more scientific way - it should be entirely possible to generate test cases fairly easily, given the simplicity of the text data I saw my results with.

Yay, thanks for info and inspiration, sure it seems like fun weekend project.

(BTW your snippet's content sounds interesting and feels relatable, definitely intrigued.)

Finally did some synthetic measurements of (hopefully) parse times (not render nor CSSOM or anything like that). Differences seems microscopic but overall aligned with my initial expectations (omitting the closing tag actually shaves a bit of yak's hair), so I suspect that the real overhead you observed is caused by something happening after parse, where absence of trailing white-space in DOM nodes (ensued by closing tags) helps in some way. I guess something around that white-space or text layout. (Speaking of insignificant white-space, you could probably gain some more microseconds if you'd stuck paragraphs together (`..</p>\n\n<p>..` -> `..</p><p>..`), however such minification seems like a nuisance.)

Tested only on Windows, in browser consoles.

Numbers:

Firefox (Nightly) (performance.now is clamped to miliseconds)

    total; median; average; snippet
    2279.0; 4.0; 4.558; '<p>_'
    2652.0; 4.0; 5.304; '<p>_</p>'
    2471.0; 4.0; 4.942; '<p>_abcd'
    2387.0; 4.0; 4.774; '<p>_\n'
    3615.0; 5.0; 7.230; '<p>_</p>\n'
    2380.0; 4.0; 4.760; '<p>_abcd\n'
    3093.0; 5.0; 6.186; '<p>_\n</p>\n'
    3107.0; 5.0; 6.214; '<p>_</p>\n\n'
    2317.0; 4.0; 4.634; '<p>_abcd\n\n'
    2344.0; 4.0; 4.688; '<p>_\n\n'
Google Chrome (performance.now is sub-milisecond)

    total; median; average; snippet
    2870.4; 5.2; 5.741; '<p>_'
    2895.2; 5.4; 5.790; '<p>_</p>'
    2684.7; 5.2; 5.369; '<p>_abcd'
    2845.4; 5.2; 5.690; '<p>_\n'
    3836.7; 7.3; 7.673; '<p>_</p>\n'
    2837.8; 5.2; 5.676; '<p>_abcd\n'
    4022.5; 7.4; 8.045; '<p>_\n</p>\n'
    4044.3; 7.3; 8.089; '<p>_</p>\n\n'
    2928.4; 5.2; 5.857; '<p>_abcd\n\n'
    2805.3; 5.2; 5.611; '<p>_\n\n'
Test config

    Snippets per document: 5000
    Rounds: 500
    Wrap: '<!doctype html>(items-paragraphs)'
    Content each item (_): bunch of random digits chunks, something like '1943965927 52 27 5 51664138859173 5161 7226 5 15 2 55679 6553712585'
Code: https://gist.github.com/myfonj/57a6a8fcb1c5686527412543a897c...

(Before realizing I can use synthetic domparser I made something what measures document load time in iframe (http://myfonj.github.io/tst/html-parsing-times.html) but it gives quite unconvincing results, although probably closer to the real world. Understandably, synthetic domparser can crunch much more code than visible iframe.)