Hacker News new | ask | show | jobs
by bct 4587 days ago
> easily get the content you need in JSON format or similar

When user stylesheets were conceived this is what HTML was.

1 comments

JSON is a data format. HTML is a markup language. You don't store data as HTML, because it's not intended to store data, only presentational markup.
> You don't store data as HTML

You do if your data is a document.

And after CSS was introduced, HTML was definitely not supposed to be presentational.

> You do if your data is a document.

A document mixes information with presentation. It's not data. Because it mixes information with presentation, you have to re-prepare it for each medium you intend to display it on.

HTML is a poor document format. If you store your document as HTML, then you also have to store the CSS along with it or it won't be a complete document. Unless you want to include the styling in the HTML file, a kludge which violates the Single Responsibility Principle.

There are plenty of document formats out there that don't have this issue, like PDF or RTF.

If you want your document to look more like data, then what you do is factor out all the atomic bits of information into values that you can then input into a database. Then write code to present it. Works well for structured documents like orders, invoices, or reports, less so for unstructured information like blog posts. For these, mixing presentation with information is unavoidable, just adding bold-face to a word means you'll need to store presentation information in your data. The "semantic web" is intended to address this.

> And after CSS was introduced, HTML was definitely not supposed to be presentational.

Presentation involves more than just style. CSS handles the style of web content, HTML handles the structure.

I'm familiar with the distinction between document and data. I'm talking about a site like a blog, where the document is the data. That was the original context of this conversation, IIRC.

HTML is emphatically not a poor document format, but it satisfies different requirements than PDF and RTF do. The weak connection between structure and what you call "style" (what I would call presentation) is a feature; it allows clients to modify the presentation to suit their needs. If I want a bigger font or higher contrast or a smaller column width, I'm able to do that because of the separation between structure and style.

But (as you rightly pointed out) this is a lot harder than it should be (and harder than it used to be). As a result, when an HTML document doesn't work for someone (due to its presentation) they have to complain to document creators instead of merely configuring their client (once) to suit their needs. This sucks.

> But (as you rightly pointed out) this is a lot harder than it should be (and harder than it used to be).

It's hard because doing this is moving in the wrong direction concerning the intended abstractions.

As I said earlier, a document combines unstructured information with presentation, and you have to re-do the document every time the presentational logic changes. This is necessary because the information in the document is unstructured, it's not like an order form.

Because you cannot predict what form unstructured information will take, the presentational logic is necessarily strongly coupled with the information. That's why it's hard to do what you want. You can pop open Dev Tools and manually do it, but you can't write a program that will take _all blog posts_ and restructure them the way you want to, because _all blog posts_ is impossible to reason about.

No amount of evolution to the HTML or CSS standards will work out this particular bit of complexity. If you could standardize a blog post, then you could write a program to do it. But it would only work on posts that meet the standard.

Say you made it so every blog CMS out there stored the text in the DB in Markdown format and provided an API so you could get at the Markdown. Then you could do what you say, provide your own styling. What this would be doing is introducing a separation of concerns. You push most of structure out of the data, and divide up styling duties between a base level (Markdown) and an upper level. (whatever you're using to display it) You may not even need the API if the CMS doesn't screw around too much with the presentation by putting, say, ads, in the middle of the content. Then you could screen scrape and convert the generated HTML back into Markdown, but again, this is the wrong way to go, (depending on concretion rather than abstraction) and prone to breakage. You really need the API layer to do this properly.

But you can't hope that one day HTML and CSS will make sense again like the old days and that user-styling will work again. It only worked before in the very early days of the web because everything was super simple and people could live with the edge cases that cropped up, not because the underlying domain changed. That solution was always brittle, and it broke the second people wanted greater flexibility in presentation.

> You may not even need the API if the CMS doesn't screw around too much with the presentation by putting, say, ads, in the middle of the content.

This is exactly what I'm saying.

> Then you could screen scrape and convert the generated HTML back into Markdown,

Why do you want the Markdown at all?