Hacker News new | ask | show | jobs
by jerf 3568 days ago
It should be pointed out that while this was once accepted as gospel, it has been coming under a lot of fire lately. HTML, once arguably the flagship of this principle and its greatest success (I say "arguably" because you can also argue TCP), no longer works this way. HTML5 specifies how bad input should be handled, and if you accept that "how to process nominally bad input" as the "real" standard, HTML is now strict in what it accepts. It's just that what it is strictly accepting appears quite flexible.

I'm not a big believer in it myself; "liberal in what you accept" and "comprehensible for security audits" are not quite directly opposed, but certainly work against each other fairly hard. There's a time and a place for Postel's principle, but I consider it more an exception for exceptional circumstances rather than the first thing you reach for.

2 comments

> HTML5 specifies how bad input should be handled, and if you accept that "how to process nominally bad input" as the "real" standard, HTML is now strict in what it accepts.

HTML5 is a shining example of "be liberal in what you accept", and its improved documentation of how to handle bad input (note that bad input is still permitted!) greatly expands HTML's "be conservative in what you send". I think HTML5 is a perfect example of the Robustness Principle.

The "bad input" is, arguably, no longer bad input. The standard has been redefined to strictly specify what to do with that "bad" input, and if you don't handle it exactly as the standard specifies, it won't do what you "want" it to do.

That's not "being liberal in what you accept". Being liberal in what you expect is what we had before HTML 5, where the standard specified the "happy case" and the browsers were all "liberal in what they expect", in different ways. I am not stretching any definitions here or making anything up, because "liberal in what you accept" behaviors in the real world demonstrably work this way; everybody is liberal in different ways. It can hardly be otherwise; it isn't "being liberal in what you accept" if you accept exactly what the standard permits, after all. When liberality is permitted, what happens in practice is that out-of-spec input is handled in whatever the most convenient way for the local handler is, in the absence of any other considerations (such as deliberately trying to be compatible with the quirky internal details of the competition). Browsers leaked a lot about their internal differences if you observed how they tended to handle out-of-spec input. Thus a standard like HTML5 that clearly specifies how to handle all cases now is fundamentally not "liberal in what it accepts" anymore.

Instead, it is a rare, if not unique, example of a standard that has been rigidly specified after a couple of decades of seeing exactly how humans messed up the original standard. It is, nevertheless, now quite precise about what to do about the HTML you encounter. You aren't allowed to be "liberal", you're told exactly what to do.

> The "bad input" is, arguably, no longer bad input.

What? Yes it is! Defined behavior for invalid markup doesn't make that markup valid.

HTML5 doesn't refuse to accept anything that HTML 4 accepted. Defining behavior for invalid markup does not even impact "be liberal in what you accept", the scope of what is accepted hasn't changed. It affects "be conservative in what you send", in particular it more closely matches that half of the principle.

> HTML5 doesn't refuse to accept anything that HTML 4 accepted.

It does. It doesn't accept NET syntax, i.e., `p/This is contents of a p elements/`. (No browser ever supported this, but because HTML 4 is defined to be an SGML application and it's DTD allows NET syntax to be used, it is theoretically conforming HTML 4.)

Ah good point, thanks for the correction.
(There's also another load of SGML bits of syntax that browsers have never supported which HTML5 doesn't support. Indeed, HTML 4 has a whole section of such things: http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.3)
I think web browsers are a better example of it. The HTML parsing/DOM tree system usually is pretty forgiving about missing/malformed tags, but still always returns a result rendered as if the HTML had been written to spec.