Hacker News new | ask | show | jobs
by taeric 1169 days ago
Yes, though many languages have lenient parsers. Most browser parsers, for example, will probably only be lenient if parsing "HTML."

    new XMLSerializer().serializeToString(new DOMParser().parseFromString("<a>hello < </a>", "text/html")) 
The above in my console does as expected there. And again, entities are a very dangerous part of XML and friends.

You are correct that if you tell it that that is xml, the browser will throw it back at you. Just as the JSON parser will barf on JSON.parse("{'test':'value'}").

1 comments

per specifications, json parsing is not lenient, html parsing is lenient
Right, and amusingly, more than a few json parsers are very lenient in this. That or folks abandon ship fairly quickly and go for another spec that is far more friendly.
well json definitely does not accept `{'test':'value'}` as valid input

any parser that behaves otherwise is pretty clearly buggy

json has many problems but parsing ambiguity is not really one of them

Me thinks you have never looked at the field. I'd as soon declare csv is an error free format. Only true if you ignore the proliferation of applications that get it wrong. In subtle ways, often. Still wrong.
csv is wildly ambiguous, to the frustration of ~every data science engineer in industry

json is not

show me an application that parses `{'a':'b'}` as valid JSON, i'm actually interested, probably there are some which exist, but there is no ambiguity about those applications being wrong

To be pedantic, html parsing is not lenient, it is unambiguously specified.
if that were true then browsers would refuse to render text/html responses that didn't include a closing </html> tag, i guess
No, because the closing </html> tag can be omitted according to the current HTML spec. See https://html.spec.whatwg.org/#optional-tags
this is exactly my point

html is not precisely defined

Sorry I don't understand your argument. HTML is fully and unambiguously defined, as you can see if you follow the link. Some tags are optional in certain contexts, but this is also precisely defined.
I think you're missing the point that it is defined, the current html5 spec says that <title> implies the existence of <head>, <body> implies the end of <head>, body tags imply the end of <head> and the start of <body> etc.

HTML5 is not XHTML.

<!DOCTYPE html> <title>Title <h1>Heading

expands to

<!DOCTYPE html> <head><title>Title</title></head> <body><h1>Heading</h1></body>