Hacker News new | ask | show | jobs
by _mql 5103 days ago
Using this thread to extend the discussion about digital content authoring...

Prose relies on Markdown, the current de-facto standard for content authoring on the web.

But what comes after?

If you ask me the answer is clearly: Semantic Rich Text Editing.

So if you like Prose, you might also like the idea of Substance, which is essentially about considering content as data and separating it from presentation. The challenging part here is to come up with web-based tools, that reliably maintain plaintext and annotations separately. Once these tools are ready, a new generation of application for collaborative content composition can be built. The resulting structured content is ready to be analyzed, visualized, turned into arbitrary output formats (PDF,ePub, …) or integrated with other applications.

Imagine Prose, providing WYSIWYG editing in realtime, plus the concept of patches to suggest improvements to a particular document.

Related links:

- http://github.com/substance (see composer, surface, text as well as architecture and document repositories)

- https://github.com/prose/prose/issues/139

So if (and only if) you like that idea, pls support our entry for the Knight News Challenge's data call.

http://newschallenge.tumblr.com/post/25422992783/substance-t...

Or even better, start contributing! :)

Thanks,

Michael

2 comments

You have it entirely backwards. Markdown is successful precisely because it isn't a semantic format; it's a reaction to overengineered semantic formats like HTML. Markdown is a purely presentational format; asterisks don't represent some abstract notion of emphasis, they denote italic text. The whole point is that you don't "maintain plaintext and annotations separately"; the formatting is part of the text.
I agree on over-engineered semantic formats. HTML used to be a document representation format. But for me it has turned into a presentation format, with all the DOM manipulation etc. This is perfectly fine, I just had to realize it. Now i just use divs and spans for my layouts, not trying to 'render' semantics. However, on a higher level there should be some document representation formats that can be turned into anything. That's the idea.

Well and I really don't think inline styles / annotations are good.

>Well and I really don't think inline styles / annotations are good.

Oh but they are. For the vast majority of content creators, semantics are inseparable from presentation; an article I write really does the semantics of looking how it looks, and can't be transformed into something different.

Just like in programming, to extract a good abstraction such as the semantic content of an article you need to test against three different implementations. But most people only write for and test against a single presentation version, so it should be no surprise that even if they try and separate the semantics and the presentation, they'll get it wrong.

How will Substance be an improvement on TeX/LaTeX? You can write content in LaTeX and use Pandoc to convert it to just about any other markup language. And it produces beautiful print proofs.
It's meant to be very strict about semantics (LaTeX actually isn't - it has a lot of style-related commands), and it will be extensible (so you can add your own content types). It's based based on a JSON document model that is describing a series of operations that can be used to reconstruct any document state and also enabling realtime collaboration. Authoring will take place in WYSIWYG-fashion. By having a real data-representation of the document, it can be turned into anything (e.g. LaTeX, PDF, HTML).

We use Pandoc to support a variety of output formats already.

E.g. goto: http://substance.io/michael/data-js and open the export dialogue.

However, we're working on a new architecture (https://github.com/substance/architecture) because we don't want to treat the symptoms of the current implementation and instead fix the problems we've identified at heart.