Hacker News new | ask | show | jobs
by jchw 614 days ago
In all fairness, running modest to large MediaWiki instances isn't easy. There's a lot of things that are not immediately obvious:

- For anything complex/large enough you have to set `$wgMiserMode` otherwise operations will just get way too long and start timing out.

- You have to set `$wgJobRunRate` to 0 or a bunch of requests will just start stalling when they get assigned to calculate an expensive task that takes a lot of memory. Then you need to set up a separate job runner in the background, which can consume a decent amount of memory itself. There is nowadays a Redis-based job queue, but there doesn't seem to be a whole lot of documentation.

- Speaking of Redis, it seems like setting up Redis/Memcached is a pretty good idea too, for caching purposes; this especially helps for really complicated pages.

Even to this day running a Wiki with an ambient RPS is kind of hard. I actually like MediaWiki because it's very practical and extensible, but on the other hand I know in my heart that it is a messy piece of software that certainly could make better use of the machine it's running on.

The cost of running a wiki has gone down over time in my experience though, especially if you are running things as slim as possible. A modest Digital Ocean machine can handle a fair bit of traffic, and if you wanted to scale up you'd get quite a boost by going to one of the lower end dedicated boxes like one of the OVHcloud Rise SKUs.

If anyone is trying to do this I have a Digital Ocean pro-tip. Don't use the Premium Intel boxes. The Premium AMD boxes are significantly faster for the money.

One trap I also fell into was I thought it might be a good idea to throw this on a hyperscaler, you know, Google Cloud or something. While it does simplify operations, that'll definitely get you right into the "thousands of dollars per month" territory without even having that much traffic...

At one point in history I actually felt like Wikia/Fandom was a good offering, because they could handle all of this for you. It didn't start out as a bad deal...

3 comments

This is so true.

I adopted mediawiki to run a knowledge base for my organization at Microsoft ( https://microsoft.github.io/code-with-engineering-playbook/I... ).

As I was exploring self-host options that would scale to our org size, it turned out there was already an internal team running a company wide multi-tenant mediawiki PLATFORM.

So I hit them up and a week later we had a custom instance and were off to the races.

Almost all the work that team did was making mediawiki hyper efficient with caching and cache gen, along with a lot of plumbing to have shared infra (AD auth, semitrusted code repos, etc) thst still allowed all of us “customers” to implement whatever whacky extensions and templates we needed.

I still hope that one day Microsoft will acknowledge that they use Mediawiki internally (and to great effect) and open-source the whole stack, or at least offer it as a hosted platform.

I tried setting up a production instance af my next employer - and we ended up using confluence , it was like going back to the dark ages. But I couldn’t make any reasonable financial argument against it - it would have taken a a huge lift to get a vanilla MW instance integrated into the enterprise IT environment.

Microsoft did open source a bunch of their mediawiki extensions. https://github.com/microsoft/mediawiki-extensions

Last i heard though they were moving off it.

Nice!! Made my day. Not sure how they can move off of it, partly because there’s no alternative that has a fraction of the capability
The rumour i heard is they were making their own custom thing.

There was some rumours that they were unhappy about mediawiki's response to patches they submitted (they made a bunch around accessibility). However i looked through their patches at one point when this rumour started flying around and it looked like most were merged. Those that weren't generally had code review comments with questions or pointing out mistakes which were never replied to. I sort of suspect the patch thing was some sort of internal excuse because the team involved wanted to make their own thing.

Regardless, im really happy they decided to open source their extensions and it was nice to see that they put in effort to upstream core patches.

Yeah the team running the platform internally was amazing to work with and did incredible work with just a handful of resources.

The efficiency for scale of mediawiki is hard to beat.

A lot of things should be solved by having (micro)caching in front of your wiki. Almost all non-logged in requests shouldn't even be hitting PHP at all.
In my experience this hasn't been necessary yet on anything I've ran. I know WMF wikis run Varnish or something, but personally I'm trying to keep costs and complexity minimal. To that end, more caching isn't always desirable, because RAM is especially premium on low-end boxen. When tuned well, read-only requests on MediaWiki are not a huge problem. The real issue is actually just keeping the FPM worker pool from getting starved, but when it is starved, it's not because of read-only requests, but usually because of database contention preventing requests from finishing. (And to that end, enabling application-level caching usually will help a lot here, since it can save having to hit the DB at all.) PHP itself is plenty fast enough to serve a decent number of requests per second on a low end box. I won't put a number on it since it is obviously significantly workload-dependent but it would suffice to say that my concerns with optimizing PHP software usually tilt towards memory usage and database performance rather than the actual speed of PHP. (Which, in my experience, has also improved quite a lot just by virtue of PHP itself improving. I think the JIT work has great potential to push it further, too.)

The calculus on this probably changes dramatically as the RPS scales up, though. Not doing work will always be better than doing work in the long run. It's just that it's a memory/time trade-off and I wouldn't take it for granted that it always gives you the most cost-effective end result.

Varnish caching really only helps if the majority of your traffic is logged out requests. Its the sort of thing that is really useful at a high scale but matters much less at a low scale.

Application level caching (memcached/redis/apcu) is super important even at a small scale.

Most of the time (unless complex extensions are involved or your wiki pages are very simple) mediawiki should be io-bound on converting wikitext -> html (which is why caching that process is important). Normally if db is healthy, db requests shouldn't be the bottle neck (unless you have extensions like smw or cargo installed)

Most of MediaWiki seems to avoid too much trouble with contention in the database, but I was seeing it prior to enabling application-level caching. It seemed to be a combination of factors primarily driven by expensive tasks in the background. Particularly complex pages can cause some of those background tasks to become rather explosive.
Have any of Intels server offerings been "premium" since epyc hit the scene?

I just assumed they were still there based on momentum.

With Digital Ocean the cpuinfo is obfuscated so figuring out exactly what you're running on requires a bit more trickery. With that said I honestly assume that the comparison is somewhat older AMD against even older Intel, so it's probably not a great representation of how the battlefield has evolved.

That said, Digital Ocean is doing their customers a disservice by making the Premium Intel and Premium AMD SKUs look similar. They are not similar. The performance gap is absolutely massive.