Hacker News new | ask | show | jobs
by adolph 545 days ago
The article linked as “W3C specifications are bigger than POSIX.” is also worth reading.

The total word count of the W3C specification catalogue is 114 million words at the time of writing. If you added the combined word counts of the C11, C++17, UEFI, USB 3.2, and POSIX specifications, all 8,754 published RFCs, and the combined word counts of everything on Wikipedia’s list of longest novels, you would be 12 million words short of the W3C specifications.

https://drewdevault.com/2020/03/18/Reckless-limitless-scope....

1 comments

Sorry, but that analysis is too sloppy to allow any such comparisons.

If you look at the scraped document list [1]:

* Most of these are not normative! They're not specifications, they're guides, recommendations, terminology explainers, and so on.

* A lot of documents are irrelevant to implementing a web browser (XSLT, XPath, RDF, XHTML, ITS, etc.).

* A lot are obsolete (e.g. SMIL, OWL).

* There are tons of duplicate versions (all of CSS 1-3 are included; multiple versions of HTML, MathML, and of course the irrelevant XML-based standards).

* Many standards are scraped both as individual section files, and as a single complete.html file. He didn't notice this, and counted both.

As a particularly egregious example, he includes every version of the Web Content Accessibility Guidelines (WCAG) standard, going back to 1999, each of which is large.

I have not done any kind of analysis myself (which should be thorough to actually be fair), but if you prune it down to the core technologies (HTML5, CSS, ECMAScript, PNG/GIF/WebP, etc.), I'll wager it's probably less than a million, or at the very least less than 2 million. The ECMAScript spec is just 356,000 words.

[1] https://paste.sr.ht/~sircmpwn/475ad10f9ff9f63cd0a03a3f998370...