Hacker News new | ask | show | jobs
by wvenable 340 days ago
> Unfortunately, while an XML schema can be simple, it can also be unnecessarily complex, bloated, convoluted and difficult to implement without specific knowledge of its features.

One could now use that exact sentence to describe the most popular open document format of all: HTML and CSS.

4 comments

Can you be more specific here? HTML and CSS can't be described like that in my opinion.

It is complex but not complicated. You can start with just a few small parts and get to a usable and clean document within hours from the first contact with the languages. The tags and rules are usually quite self-describing while consice and there are tons and tons of good docs and tools. The development of the standards is also open and you can peek there if you want to understand decisions and rationals.

You could say the existing browser vendors pushed to make the HTML standard more complicated to the point that there's no chance for a newcomer to compete with the existing ones.
Ladybird would like a word.

Though I agree that the web standards are extremely large. Not sure if they are too large, given their cross-platform near OS layer functionality.

It's not about making a document.

It's about making software that would display a document in that format correctly.

I.e., a browser.

The current HTML spec alone is a 1000+ page PDF, and I can't imagine the CSS spec being much shorter.

Wordsmithing your way around this doesn't make them any easier.

Sure, technical documents are long but that still doesn’t support the original claim that they are “unnecessarily complex, bloated, convoluted” and it’s actually evidence against the assertion that they’re “difficult to implement without specific knowledge of its features”: most of why those are long documents is that they carefully detail how necessarily complex systems interact in sufficient detail to implement them whereas the Office XML specs at least historically had things like flags telling to behave like, say, Word95 without fully specifying the behaviour in question.
The original claim was clearly actually just an opinion; I don't think there's merit to treating it as a series of logical statements, or at intricate depth in general.

Evidence for this is in the very words used: unnecessary, complex, bloated, convoluted. These are very human terms that are thus subject to personal interpretation and opinions.

It shouldn't be surprising then that their "claim" thus fails scrutiny. All they actually meant to say is that HTML and CSS are both verbose standards with a lot of particularities - still something subjective, but I think page / word / character counts are pretty agreeable attributes to estimate this with in an objective way. Hence why I brought those up exactly.

Of course it’s an opinion: the point is that it’s neither persuasive nor internally inconsistent. They haven’t given any reason to believe they have enough domain knowledge to compare the two authoritatively. It’s also inconsistent to criticize OOXML for being difficult to implement without extra knowledge and then to criticize a truly open spec for being detailed enough to implement without extra – the entire HTML5 process was intended to reduce the number of cases where people were relying on things which required implementers to know how a specific engine like IE worked.
Sure the spec might be enormous but you don't need to touch it at all to be productive quickly. In no HTML or CSS tutorial i'v ever seen was a reference to the spec nor did i need to go there to solve something. And that in itself is another proof how nicely it is designed actually. Because on the other hand there are other document types or schemas where you absolutely have to go to the spec because it's is so cryptic and badly designed and not self-explaining that there is nothing else you can do.
HTML and CSS tutorials are for people authoring HTML and CSS documents, not for people authoring HTML and CSS parsers and renderers.
Since HTML is valid XML, it really is perfectly acceptable to say it's the same!

  <p>I don't think that's true.<br>
  Perhaps you're thinking of xhtml?
Observe the lack of a closing p tag, to say nothing of the multiple self-closing tags in html: hr, img, link, meta, ...

https://html.spec.whatwg.org/multipage/grouping-content.html...

Yeah but those are open standards, where as Microsoft is the only one with true knowledge of its XML.
You know you're referring to ECMA-376 and ISO/IEC 29500?
In OP's defense, is there a freely available reference implementation of that standard? I know that LibreOffice certainly tries but I'd guess theirs is closer to a reverse engineered than reference impl
yes HTML and CSS have went unhinge without question

big reason for that is that they where not designed for modern requirements like being used as a general purpose application UI toolkit

especially CSS was designed printable documents, not modern websites

and HTML was designed to represent the core semantic structure of a "classical" document (and not a too fancy one either), with minimal formatting (e.g. bold, italic, underline) but even on old websites it was very common to not be used like that at all (e.g. think old table for the whole site to create header and side bar tricks, now doable nicer with HTML5/modern CSS)

so its kinda a markup and style language chosen in the very early internet days only to realize shortly later that websites develop in a direction very mismatched to the designs of both languages (but both happen to be squeezable into their new roles, barely).

Kinda funny. But not really the situation behind OOXML.

This is similar in zero ways.