Hacker News new | ask | show | jobs
by druiid 5022 days ago
Well, I have to say... replication related issues like this are why I/we are now using a Galera backed DB cluster. No need to worry about which server is active/passive. You can technically have them all live all the time. In our case we have two live and one failover that only gets accessed by backup scripts and some maintenance tasks.

Once we got the kinks worked out it has been performing amazingly! Wonder if GitHub looked into this kind of a setup before selecting the cluster they did.

1 comments

any details on the kinks you worked out?
Sure. Maybe I should do a writeup for it on my blog at some point in the near future :).

The two main issues we encountered both had to do with search for products/categories on our sites. The first was that Galera/WSREP doesn't support MyISAM replication (It has beta support, but I wouldn't trust it). This meant that we had to transition our fulltext data to something else. The something else in this case was Solr which has been a much better solution anyway (fulltext based search was legacy anyway so this I can kind of count as a win).

The second issue and the one that was causing random OOM crashes was partly due to a bug, partly due to the way the developer responsible for the search changes implemented things. The bug part is that galera doesn't specifically differentiate between a normal table and a temp table. When you have very very small/fast temporary tables that are created and truncated before the creation of the table is replicated across the cluster it can leave some of these tables open in memory (memory leak whoo!). We were able to fix for this and have been happy ever since.

If there's any interest I can do a larger writeup about actual implementation of the cluster, caveats and the like.

Consider this an expression of extreme interest on my part.
+1