|
|
|
|
|
by niconii
1505 days ago
|
|
Just to add to this, HTML parsers will attempt to make sense of any random line noise you give it and turn it into a DOM. When I say something is "invalid" HTML, what I mean is that it's not allowed by the spec and will result in an error if you run it through a validator (which you should do!) For example, try running the following document through the W3C's HTML validator[1]: <!DOCTYPE html>
<html lang=en>
<title>Test Document</title>
<p>
<div></div>
</p>
The HTML spec contains a list of all possible parse errors[2].[1] https://validator.w3.org/nu/#textarea [2] https://html.spec.whatwg.org/multipage/parsing.html#parse-er... |
|
Of course, that screams "MISTAKE" that a validator should warn you about. Like a linter that would spot missing extra parentheses for an assignment in a if condition in C-like language. It is allowed to not put the parentheses, but it is recommended to put them.
And of course, that makes "Valid HTML" (almost?) redundant (There are probably "vocabulary" errors that are possible, like a missing src attribute for an img or a missing title tag in head - don't take my words on this though).
div in p is not invalid, it's outright impossible to obtain from HTML parsing.
You can obtain this by doing this in JavaScript:
Or by parsing as XHTML: You get: Which I realize is actually a bit scary, I go out of my way to write XHTML in the hope any error will be caught, but parsing as text/html actually produces a valid dom where parsing as XHTML won't necessarily.