Hacker News new | ask | show | jobs
by kgtm 3566 days ago
Maybe it will make more sense once it fully sinks in, but I think in general it is a mistake to make developers think about when and where certain things can be omitted. It's more straightforward to simply do one thing, consistently, following the "explicit is better than implicit" mantra.

What happened to optimizing for mental overhead instead of file size? This simply should be a build step, part of your minification and concatenation dance, not having to consider all of these when trying to decide if I should close my <p> tag or not:

A p element's end tag may be omitted if the p element is immediately followed by an address, article, aside, blockquote, details, div, dl, fieldset, figcaption, figure, footer, form, h1, h2, h3, h4, h5, h6, header, hgroup, hr, main, menu, nav, ol, p, pre, section, table, or ul element, or if there is no more content in the parent element and the parent element is an HTML element that is not an a, audio, del, ins, map, noscript, or video element, or an autonomous custom element.

8 comments

This reasoning is why I write all the web pages for my personal projects using XHTML. I can't be bothered to remember which tags are self-closing, which tags need explicit closing tags which can't be combined into the opening tag, etc. Everything's consistent in XHTML.
Agreed. Years ago I started doing all my projects in XHTML because I found that debugging silent HTML errors was not fun.

Silent errors include things like malformed tags and attributes, incorrect nesting structure (thus also messing up where CSS rules are applied), and unescaped left-angles and ampersands.

This is why I've always advocated DTD-validation of HTML (which is shockingly underused).
I've never actually seen anyone validate their HTML. If you suggested this in most companies, they would look at you like you had two heads.
About a decade ago it was a pretty commonplace thing to happen.

HTML 4/4.1 was kind of messy, and could have rendering issues. So going with an (x)HTML validator was a common thing, as well as a marketable value proposal to clients.

HTML 5 had much "saner" implementations, so validators fell by the wayside as they weren't as necessary for compatibility.

A good text editor will validated against the DTD as you type. And after you publish, you can use https://validator.w3.org/
The Firefox source viewer (not the developer tools DOM viewer) does validation. It will highlight bad tags in red and if you hover over them it shows the error.
I'm pretty sure Moz has HTML validator built into its SEO tool, so it may be more common than you think solely because of that. We validate HTML at my company—If we don't we'll hear about it next time our boss runs an SEO check,
How can you validate HTML when most of the HTML these days is templated and generated dynamically?
It turns into a complete document eventually.
It's why I stopped using hand coded markup at all, aside from markdown for article data. Everything else is data pushed into templates that generate "whatever the code is that the client needs to receive", and let the build tools figure it out. That's what they're for.
As long as you're sure it will never be interpreted as HTML, you can do that. Which is harder than it should be, because doctype declarations are ignored. One lost header or unforseen embedding and everything after that <script /> tag gets eaten.
I've started using haml recently, it handles this for you and works well for me.
You shouldn't have to remember it, but you editor could and should.
XML validators are much more common in editors than HTML validators, probably because XML is both easier to parse and used for a lot more than XHTML.
I write and edit all my Genshi [1] templates as xhtml, so I can validate and process them as crisp clean hi-fidelity xml, and then pump them out to browsers with the html serializer [2].

If I were inclined to follow Google's guidelines on omitting optional tags, it would be easy to write a stream filter that removed them [3].

But I prefer source templates to have all the explicit properly indented structure, so they're easier to validate and process with XML tools (and by eye), and unintentional mistakes don't sneak through as easily.

For the same reason, I also prefer not to write minified JavaScript source code: that should be done by post-processors, no humans. ;)

[1] https://genshi.edgewall.org/

[2] https://genshi.edgewall.org/wiki/ApiDocs/genshi.output

[3] https://genshi.edgewall.org/wiki/Documentation/streams.html#...

If you are writing <p>something <div>like this</div><p> then your editor knows you are making a mistake and can highlight it.

If on the other hand you are not closing tags that autoclose, how can your editor tell you? There is no way to know it's not intended.

The editor could have a setting for that.

But mostly I meant that you don't have to close the autoclosing tags.

interesting - which mime/type do you use?
application/xhtml+xml as it's supposed to be, though it's not that I have a choice since Github sets all the headers.
So you don't? Coz it would be nuts.. I'd wager 90%+ of the 3rd party JavaScript out there will choke on it.
You can use text/html. Technically it wouldn't be an XML resource, but it's correct for HTML5. You can also use an xhtml doctype[1]. And don't forget that the HTML5 namespace is http://www.w3.org/1999/xhtml! [2] So basically you use your xhtml tools and just publish as HTML5.

[1] https://www.w3.org/TR/html5/syntax.html#obsolete-permitted-d... [2] https://www.w3.org/TR/html5/infrastructure.html#namespaces

"I can't be bothered to remember which tags are self-closing, which tags need explicit closing tags which can't be combined into the opening tag, etc"

You're right lets not bother ourselves with this small things, cause === and == do the exact same comparison in Javascript and all browsers are exact replicates when implementing html, css and javascript.

Beyond all the sarcasm, in reality, web programming is a hassle. But other programming languages and markups have their quirks as well. I'm glad you found a solution, but it doesn't mean we shouldn't look at the fine details of a specification.

>cause === and == do the exact same comparison in Javascript

I also don't bother remembering how == works. I use === everywhere. The reason is the same - lower cognitive overhead.

>all browsers are exact replicates when implementing html, css and javascript

The browsers I care about all parse XML correctly.

>I'm glad you found a solution, but it doesn't mean we shouldn't look at the fine details of a specification.

I'm only talking about myself, yes. I only make websites for my personal projects. I'm certainly not a web dev by profession or even by hobby.

I don't understand your argument. Yes, web programming has lots of warts and subtle behaviors and inconsistencies. So shouldn't we jump on a chance to remove a small part of that from our day-to-day development? OP isn't advocating ignorance of the spec, just a way not to need to reason with it as often.
No argument, just a comment that displays my disapproval of not fully complying with the spec before blaming it.
How is anyone not complying with the spec by not micro-optimizing away legal-but-redundant tags?
You already have to consider all of those cases about the <p> tag: because they auto close when they hit one of those elements, that means that <p> tags can't contain any of them. If you don't know about this while using <p> tags, you can be in for a world of fun mysterious issues.
But all those tags are things that no sane developer would put inside a p tag anyways, so you don't really have to think about them.

The real mental overhead is incurred when reasoning about the tag following the p, which could be anything. "Hmmm, I have a nav tag coming after this p tag. Does that implicitly close it?"

Although if you had a good autoindenter, you could catch any mistakes by how it was indented. "Oh, that nav tag is on the same indentation level as the p tag, I guess it does implicitly close it."

I have done web dev on and off for over 15 years and I've never even thought about what happens when you put a h1 in a p. In my opinion the browser should crash and the operating system should BSOD. I have always been severely annoyed by the amount of shit browsers put up with. I don't understand why XHTML strict didn't get the traction it deserved and why they didn't continue along that line with HTML5.
Because the world is made up of messy people. And the value of allowing messy content was perceived as outweighing the value of consistency and reliability. I happen to agree.
I ran into this when working on some software that put user comments in <p> tags. I added some allowed markup that came out as <div> tags for a collapsible section. It didn't strike me as a particularly insane feature, but I about lost my mind trying to figure out why the <div> tags appeared to negate the <p> tag styling for all of the text after it.
> This simply should be a build step

This is a great point, but when I think of build steps, I think of something like minifying which comes with a performance gain.

I'm not sure I see what the obvious gain to omitting optional tags in the way Google suggests is.

Edit: To clarify, I'm wondering if there's some performance gain by the browser not having to parse the implicit optional tags.

How is a build step that turns your HTML into a smaller amount of HTML with the exact same behavior (by removing optional tags) different from a "minification" step that turns your HTML or JS into a smaller amount of HTML/JS with the exact same behavior?

This is minification, isn't it?

Updated to clarify my comment as a reference to browser performance not file size.
Smaller file size -> faster loading (in theory... if gzipped, it's probably redundant).

Possibly faster parsing, because the parser has less HTML so go through. (also probably not valid, because I'd be pretty sure that reading a string from memory is not the bottleneck in parsing, compared to logic, memory allocations, etc).

It could make a difference for Googles server infrastructure though.

If they have to download a tiny bit less, and save a tiny bit on CPU cycles and memory for each page, , it might still lead to considerably savings.

> I'm wondering if there's some performance gain by the browser not having to parse the implicit optional tags.

The motivation behind this style is not browser parsing perf - it's network perf. The smaller your HTTP response, the fewer packets (and round trips) required to transmit it.

If your output is compressed (which it should be if you're worried about response size) then omitting end tags has much less impact, I believe. All of the tags should get compressed well because they're repeated so often, and they should be much smaller overall than your non-repetitive content.
But note that on the scale of move as much data around as Google does, or even "the web as a whole", shaving even a few bytes off of every single gzip packet stream can still equate to significant network relief.
I suspect their advice is for their benefit, not other website devs. They can save a lot of space in their archive if everyone's pages were smaller. Nothing compared to better image compression though.
No - a few bytes on a web page are insignificant compared to the data volume of images and movies. This is all about getting pages to load faster on mobile.
If those extra bytes drop you from two packets to one, that's a _significant_ reduction in traffic

(which, IIRC, was the original rationale behind that style guide rule)

If you would gzip your output like you should, how much does that even buy? There's usually something better to use your time for instead of trying to shave 500 bytes out of your page.
it's not a competition, though. If there's something better to do, also do that. However, that does still leave the question of how many bytes are actually saved in transport, especially with gzipping. The benefit here is absolutely not individual developers or even individual sites, but the data transfered by entire data centers over the course of a day, week, month etc. If this recommendation can bring down the total byte transmission for "the web" by 0.001% for instance, that's still a boatload of bytes that don't bog down the network anymore.
When you're looking at fractions of a percent, remember to consider other options. Set up brotli, for example. Or redesign your site to have a leaner layout. You might not ever reach the efficiency level where optimizing optional tags is the best use of dev time.

And the overhead of tracking which tags are optional in which circumstances is not particularly small. Consider that the extra complexity could impede more optimizations in the future, especially now that your markup requires a more complex parser than it could have needed.

Have you looked at the size of Youtube and Netflix videos?

According to this study [1], 70% of web traffic is video streaming. Only 8% are web browsing (which might include images, because they are not mentioned anywhere else - didn't find any info on that).

This is not going to make any difference.

Just because the vast majority of roads are for cars doesn't mean we should therefore not try to optimize the bike and pedestrian lanes.

Sure, a lot of the traffic is streamed data rather than HTML, but 30% of close to a zetabyte of data in a single day (for the internet as a whole) is still hundreds of petabytes that can be made drastically smaller. When the numbers are that large, even optimizing for something as "insignificant" as 0.01% of the traffic means 10s or even 100s of terabytes not pumped through the network every day.

compression and encryption often don't play nice with each other. See CRIME and BREACH, for example.
The double negative phrasing Google and the spec uses makes it sound weirder than it is. You could phrase it as "only use tags that are needed for the document to be parsed correctly" which makes explicitly including an <html> tag with no attributes or a information-free stack of closing tags seem like a strange thing to do if it wasn't tradition.
file size? It's not much, but it would still strip some stuff.

I'm still bitter that HTML/XML works based off of explicit closing tags (where you can mistakenly close the wrong tag) instead of something like braces.

Use a build tool (which you should be doing anyway if you hand-write any markup, because you need to validate it) and make it rewrite </> to the relevant closing tag, if necessary... problem solved? (and yes, you'd be free to even leave </> off in many, many places: https://www.w3.org/TR/html5/syntax.html#optional-tags).

Alternatively, don't use HTML at all. Use pug (formerly "jade") or something and now you're free from all those inconvenient angle brackets.

After using pug, I don't think I can go back to plain HTML. I didn't know how much i hated closing tags.
Maybe it reduces the load on Google's crawls of the web.
Less HTML to load? Probably makes no difference in most cases, but it is less to load.

Many React/Webpack flows do something like this (minify or use a barebones template HTML).

Seems like something you can add a build step for though. Less human overhead
Sorry I wasn't very clear, that's what I meant by "React/Webpack" flow! See: https://github.com/ampedandwired/html-webpack-plugin
but I think in general it is a mistake to make developers think about when and where certain things can be omitted.

Yes, sometimes it is better to make developers think about when and where certain things are required.

I agree, but maybe a transformer step could do it automatically. Write full HTML, generate less.
After 20 years of composing HTML, the world can do better than write full HTML but use technology like jade and not worry about what goes to the browser...
Why not worry about what goes to the browser? In my eyes, what actually runs in the browser is the only thing that matters in the end. You could still write in something like Jade, transpiler, minify, then strip unneeded tags, all with automation.
I didn't mean not to care what goes to the brower, I meant if tools like jade does it right for us, the rest of us no longer have to care about those little details.

Frankly I'm amazed the HTML way of verbose writing still stands after all these years in a fast paced industry.

Hadn't heard of Jade myself so your post inspired me to go looking. On http://learnjade.com/ the front page example shows that Jade doesn't take advantage of this ability to omit the closing </p> tag. So while I agree with you that html is not the best form for authors to write in, Jade itself still has room for improvement.
Would that be jade's job? I'd argue jade's job is to make writing html easier. Another tool should take on html optimization.
> A p element's end tag may be omitted if the p element is immediately followed by block-ish element, or if there is no more content in the parent element.

> This doesn't apply if you are doing weird stuff in a non-block-ish element, or a media element, or a custom element.

is the easier way to think about it usually

It's really just better to keep a closing p tag, so you don't have to care about consequence when you edit that part later... Does not type </p> save anything? No.
I honestly prefer it when editing

    <p>
        It naturally acts as a clean way to segment
        paragraphs of text
    <p>
        And most of the tag-closing rules are roughly
        matched with the rules of using p tags altogether.
    <p>
        e.g. you can't have a div within a paragraph, so
        closing or not closing, divs can only come after 
        paragraphs!.
It actually saves at least 4 bytes per closing tag. On a larger webpage, that could easily add up to saving hundreds or thousands of bytes per request. That's a significant savings, especially for mobile.
gzip makes it insignificant.

I just took a sample page out of here which has bunch of p tags open and closed, gzipped the original and the one with </p> stripped, difference was 39 bytes.

https://en.wikipedia.org/wiki/C_(programming_language)

Ironically, if end tags were truly non-optional, html might actually compress better, because it would have less entropy (less choices). In practice, it would allow for a compression filter to represent the tree structure in a less redundant form with fewer corner cases to deal with (much like compressors do for binaries, for example).
Thousands? Over 250 p tags on a page?
It's possible/likely that's what they mean. However the final markup is generated, make it minimal, shout-out to react, packagers, minifiers, etc
I write HAML. It's confortable, resumed and strict. It outputs nice-formated HTML.