Hacker News new | ask | show | jobs
by atombender 539 days ago
Sorry, but that analysis is too sloppy to allow any such comparisons.

If you look at the scraped document list [1]:

* Most of these are not normative! They're not specifications, they're guides, recommendations, terminology explainers, and so on.

* A lot of documents are irrelevant to implementing a web browser (XSLT, XPath, RDF, XHTML, ITS, etc.).

* A lot are obsolete (e.g. SMIL, OWL).

* There are tons of duplicate versions (all of CSS 1-3 are included; multiple versions of HTML, MathML, and of course the irrelevant XML-based standards).

* Many standards are scraped both as individual section files, and as a single complete.html file. He didn't notice this, and counted both.

As a particularly egregious example, he includes every version of the Web Content Accessibility Guidelines (WCAG) standard, going back to 1999, each of which is large.

I have not done any kind of analysis myself (which should be thorough to actually be fair), but if you prune it down to the core technologies (HTML5, CSS, ECMAScript, PNG/GIF/WebP, etc.), I'll wager it's probably less than a million, or at the very least less than 2 million. The ECMAScript spec is just 356,000 words.

[1] https://paste.sr.ht/~sircmpwn/475ad10f9ff9f63cd0a03a3f998370...