Hacker News new | ask | show | jobs
by lunaru 5143 days ago
Wouldn't it be more desirable to have these WYSIWYG editors serialize to a non-HTML markup (like textile or markdown) to reduce the hassle of user-input sanitization on the back-end? (e.g. stripping script and iframe tags). What's best-practice these days for storing and displaying rich-edit user input?
4 comments

I doubt Markdown, BBCode or anything similar is a good idea here. That's just introducing extra complexity - and what for? The point of markdown is that its simple for humans to read and write directly, which isn't applicable here.

The downside to markdown should be obvious:

* more code, both server and client-side (to implement the to-and-from conversion)

* more bugs (due to more code and the complexity of escaping valid input that happens to be markup in one or the other)

* less features (if the editor supports some html that doesn't map 1-to-1 to markdown, you're in trouble)

* less future-proof/platform independent (html isn't going anywhere, but that markdown variant you're using with the custom extensions you needed might be subtly different in whatever language/platform/toolkit you'd prefer in 5 years).

Html is by far the better choice. If there's an improvement to be had here, it's in using the (compatible) XHTML5 serialization to ease parsing. And it's quite likely already using that, since that's what browsers' rich-text-editing generally produces.

I suppose it's strange to say this on HN where the markup is, well, atrocious, but after using Markdown for ages in varieties of places, it's simply more pleasant to use for the "advanced" user who doesn't want to memorize hotkeys or highlight and press buttons to give their text some basic simple formatting.

It's a hit on reddit, GitHub and more for a good reason. They could have whitelisted things as well, but they chose not to.

If you would be happy with markdown, you'll be happy with a whitelist-based HTML sanitizer. HTML santization is only a hassle if you take the blacklist approach in an attempt to allow lots more than what markdown can do.

I've used antisamy, but there are many others and I don't know which is best. But I would call the whitelist approach in general, best practice.

Depends on the use case. If you are using it in as a website authoring tool, it makes sense to store the rich-edit user input as HTML itself.

However, in a scenario like commenting or composing a message (where only limited editing options are available), storing in a format such as Markdown make sense.

Are you aware of any such editor that does store the data in Markdown or something similar?
Take a look at bergie's Hallo editor:

http://bergie.github.com/hallo/markdown.html

He also has an interesting project called Create:

http://createjs.org/

Not really what you're after, I guess, but a link anyway: http://code.google.com/p/pagedown/wiki/PageDown "PageDown is the JavaScript Markdown previewer used on Stack Overflow and the rest of the Stack Exchange network. It includes a Markdown-to-HTML converter and an in-page Markdown editor with live preview."
even in comments a whitlist of markup would save a lot of code on both the ends.
If you are going to manipulate the document structure on the server, JSON might be an alternative. There is an interesting project on Github that does HTML -> JSON conversion: https://github.com/gregory80/fastFrag