Hacker News new | ask | show | jobs
by chrismorgan 4505 days ago
You might want to be careful about security. Here's a harmless (at the time of writing!) XSS vulnerability: http://writebin.io/d/GTFx9CxV

    <img onerror='alert("uh oh")' src=z> ← that there is an example of the sort of security hole (XSS) this has. Remember: Markdown is by default a *horribly* insecure format.
3 comments

Thanks for pointing this out. Haven't had much time to consider things like this yet.
I have this request: could you please make security your first priority when you design things? If you leave it until later, it'll keep on being left until later, and it's far too likely never to happen. At the very least, it is likely to be significantly more expensive to implement later and is more likely to have holes.
No, but I want to, I really do. It's just that early designs are more useful if they're pliable.
Gruber's decision to include html in the markdown "spec" has always perplexed me.
The point of Markdown is to abstract away the common HTML use cases in a markup format. There's plenty of HTML semantics where mapping them into Markdown would be both painful and unhelpful. Like, I'm probably one of a small number of weirdos who uses code, samp, kbd and var. Good luck turning those into blobs of memorable punctuation.

Sometimes I want to put custom attributes on links: rel and class, but also RDFa attributes like property and classof. Without the ability to make a link the old-fashioned way, you can't do that.

Markdown gives shortcuts for the common stuff and allows authors to use HTML tags for the fiddly bits that it doesn't optimise. Markdown without HTML fallback for the use cases Markdown doesn't optimize for would make Markdown fairly useless.

If you went the other route and tried to turn all of HTML into Markdown syntax, you'd have an unusable language. Including HTML is the least bad of the options, IMHO.

I'm pretty sure that's because Markdown was written specifically for pre-processing HTML blog posts. It was never meant to be a full-on replacement for HTML, but simply a way to make the HTML that you're most likely to use when writing such a post easier to write. If I'm the only person who can post to my blog, there are no security issues introduced by using Markdown. (Or at least, no more security issues than would be introduced if I could use pure HTML for posting on my blog.)
I've used it for that, and the inclusion of HTML is sometimes a nice "escape hatch": you can write pages mostly in Markdown, but include inline HTML when you need to add formatting or interaction to a page that Markdown's own syntax doesn't support.

Some Markdown parsers have an option to sanitize embedded HTML, e.g. Discount, and its bindings in various languages, does a very basic s/</\&lt;/g when the "no html" option is used: http://www.pell.portland.or.us/~orc/Code/discount/

You have to be pretty careful with it, but you can use Bleach to white list the tags.