Hacker News new | ask | show | jobs
by jerf 3567 days ago
The "bad input" is, arguably, no longer bad input. The standard has been redefined to strictly specify what to do with that "bad" input, and if you don't handle it exactly as the standard specifies, it won't do what you "want" it to do.

That's not "being liberal in what you accept". Being liberal in what you expect is what we had before HTML 5, where the standard specified the "happy case" and the browsers were all "liberal in what they expect", in different ways. I am not stretching any definitions here or making anything up, because "liberal in what you accept" behaviors in the real world demonstrably work this way; everybody is liberal in different ways. It can hardly be otherwise; it isn't "being liberal in what you accept" if you accept exactly what the standard permits, after all. When liberality is permitted, what happens in practice is that out-of-spec input is handled in whatever the most convenient way for the local handler is, in the absence of any other considerations (such as deliberately trying to be compatible with the quirky internal details of the competition). Browsers leaked a lot about their internal differences if you observed how they tended to handle out-of-spec input. Thus a standard like HTML5 that clearly specifies how to handle all cases now is fundamentally not "liberal in what it accepts" anymore.

Instead, it is a rare, if not unique, example of a standard that has been rigidly specified after a couple of decades of seeing exactly how humans messed up the original standard. It is, nevertheless, now quite precise about what to do about the HTML you encounter. You aren't allowed to be "liberal", you're told exactly what to do.

1 comments

> The "bad input" is, arguably, no longer bad input.

What? Yes it is! Defined behavior for invalid markup doesn't make that markup valid.

HTML5 doesn't refuse to accept anything that HTML 4 accepted. Defining behavior for invalid markup does not even impact "be liberal in what you accept", the scope of what is accepted hasn't changed. It affects "be conservative in what you send", in particular it more closely matches that half of the principle.

> HTML5 doesn't refuse to accept anything that HTML 4 accepted.

It does. It doesn't accept NET syntax, i.e., `p/This is contents of a p elements/`. (No browser ever supported this, but because HTML 4 is defined to be an SGML application and it's DTD allows NET syntax to be used, it is theoretically conforming HTML 4.)

Ah good point, thanks for the correction.
(There's also another load of SGML bits of syntax that browsers have never supported which HTML5 doesn't support. Indeed, HTML 4 has a whole section of such things: http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.3)