Hacker News new | ask | show | jobs
by nowarninglabel 4584 days ago
Some people wonder what this is good for. For me, it would have been great, when I worked on a training ship as the IT support. Students were not allowed access to the internet (nor could our meager satellite uplink have supported it), but for my second stint on the ship, I wanted to see if I could provide them with Wikipedia. So, I grabbed the dump, set up MediaWiki, and imported it...and waited...for days and days and eventually the thing loaded, and it was great. But, it would have been super nice to have an easy installer to handle all that. So, yes there are use cases out there.
2 comments

That's pretty impressive. I never had the patience to sit through a full MediaWiki import for en.wikipedia.org.

Just to be clear, XOWA isn't an installer for MediaWiki, but it's own app. This allows it to avoid the dependency on the entire MediaWiki tool-chain (apache, php, mysql, MediaWiki). Unfortunately, this means that XOWA has to reproduce the same logic, which is quite a challenge...

It is indeed a challenge. The mediawiki syntax is the weirdest mess I have ever had to parse. There is no spec, real world usage deviates significantly from the help docs, and it's a Turing complete language with heaps of backwards compatibility hacks. So if you have something reasonably complete and correct than kudos to you!
Thanks. The syntax was challenging, especially all the template syntax ("{{my_template|{{{argument1|defaultvalue|{{nested_template}}}}}}}"). Fortunately, the new lua module should eventually replace the template syntax, which should make it easier for future parsers.
The visual editor uses a new parser, Parsoid, which has been implemented separately in node.js (iirc). That may be the answer...
Yup. It also has its own DOM, rather than continuously adding to one string and repeatedly running regex's on it (which is what MediaWiki does today).

I was already pretty far along with my own parser before Parsoid was usable though. (and my parser has its own DOM / hooks)

MediaWiki is such an astoundingly fugly piece of software.
Wouldn't it be easier to include all the original tools in a packaged form instead of reproducing their logic?
Yes, this would be the ideal approach, but it can become quite complicated (b/c the tool-chain needs to be installed for different machines). In addition, the official.xml importer (importDump.php) is not really up to the task (slow / sometimes buggy).

If you're interested in going this route, you can look at http://www.nongnu.org/wp-mirror/. This should build a local MediaWiki instance with one click. Keep in mind that it's a bit slow: it takes two days to build simple.wikipedia.org with images. In contrast, XOWA sets this up in about 30 min

I read the next comment after this and it started with '> Space required during initial..'

And I thought ah yes 'Outer Space' that could be another location an offline version could help.