Hacker News new | ask | show | jobs
by fauigerzigerk 4583 days ago
It is indeed a challenge. The mediawiki syntax is the weirdest mess I have ever had to parse. There is no spec, real world usage deviates significantly from the help docs, and it's a Turing complete language with heaps of backwards compatibility hacks. So if you have something reasonably complete and correct than kudos to you!
2 comments

Thanks. The syntax was challenging, especially all the template syntax ("{{my_template|{{{argument1|defaultvalue|{{nested_template}}}}}}}"). Fortunately, the new lua module should eventually replace the template syntax, which should make it easier for future parsers.
The visual editor uses a new parser, Parsoid, which has been implemented separately in node.js (iirc). That may be the answer...
Yup. It also has its own DOM, rather than continuously adding to one string and repeatedly running regex's on it (which is what MediaWiki does today).

I was already pretty far along with my own parser before Parsoid was usable though. (and my parser has its own DOM / hooks)

MediaWiki is such an astoundingly fugly piece of software.