Hacker News new | ask | show | jobs
by sepin4 1858 days ago
I was talking about a two "node" Mongo setup which would mirror the capabilities of a classic two node setup of Oracle RAC for example. Since in Mongo this is not a clear 1:1 match due to a different model of operation, in order to get the same performance/continuity as RAC here is what I mean.

If I want to gain a relative 100% increase in concurrent read/write access and transparent session failover to my application I have to procure two servers(with roughly the same performance lets put it at rawPerf=1.0 for simplicity) and a SAN/NAS with SCSI HBA and that would be it. All the necessary components will be running on those two servers and the application will have access to almost all capabilities of a single instance with a roughly twice the performance and fault tolerance. To do that in Mongo, I would have to procure 2 replica sets with 3 instances each(bare minimum to get the fault tolerance and not use arbiters which according to Mongo University is a bad idea) for the actual data and another RS for the configuration meta-data, which is another 3 instances. I will also need instances where I can run my "mongos" to route/consolidate my requests and to have fault tolerance I will need at least two of those. Now to do a break down of the different instances. From the two replica sets for data, I can use only 2 of the instances for writes and the remaining four only for consistent reads and that is only with the appropriate write concern. And while I have two instances for writes I am not guaranteed to get a consistent increase in write operations unless the data is at least somewhat evenly distributed. I will need six rawPerf 0.5 servers(assuming 0.5 rawPerf is capable to handle roughly half the data), that is a total of 3.0 so far. The configuration RS while not as demanding as the actual data RS will still require resources so I count it as roughly between 0.5-1.0 total for all three instances. And finally I have the "mongos" instances which although will not need disk space will need some CPU and the same amount of RAM as a regular instance in case the queries and not optimized, so I will have to give it 0.5 per instance. And the total amount of resources comes to 3.0+0.5+1.0=4.5. On the other hand with the Oracle RAC I would've needed 2.0.

As for the schema flexibility, from the Mongo documentation I take it that once the shard key is created it can't be changed arbitrarily. It can only be further fine-grained by adding more keys to it, however this is available from version 4.4. In order to change it cardinally, I would have to recreate the collection.

I'm sure you are well aware of all this, however I'm just stating my experience with MongoDB so far.

1 comments

AH I see. When we talk about nodes in MongoDB they are the elements of a replica set. That is what confused me. Yes, you are right, to get the equivalent performance of a two node database server you would have to run a sharded cluster with two shards. That is definitely more nodes than the two node design you have outlined for the RAC like design.

I am not familiar with Oracle RAC architecture but casual mention of a SAN fails to recognise the cost and complexity of even the smallest SAN. On the other hand MongoDB can run on dumb disks without a SAN or RAID required (though RAID can help with performance).

With you two node system loss of a node results in a 50% loss in performance whereas with MongoDB loss of a node will result in a secondary taking over the primary role with no loss in performance. You get what you pay for.

Setting up such clusters used to be a PITA until the advent of MongoDB Atlas. Now you can set them up with a few mouse clicks.

As regards shard keys. Yes, they are immutable, but the documents they index are not. So dynamic schema properties are preserved in a sharded cluster.

While a SAN/NAS is overall an expensive hardware in both price and manageability, there are entry level devices. You can even get direct SCSI devices in some cloud services(Azure has "shared disks"). For Oracle RAC the data is accessed from same location, it's not distributed/replicated across nodes. It's literally the same disks attached to all nodes. Also ASM works with raw block devices which bypasses the need for file system abstraction and all the latency that comes with it. You could also deploy a cluster on a cluster/network file system, but this would generally not be a good idea since you get a lot of overhead.

Postgre clusters can be configured in similar fashion.

I'm not saying that Mongo doesn't have it's strengths. However using the full potential of your environment requires much more planing and has a lot more pitfalls.

I was particularly discussing the implications on what the relative immutability of the shard key has on the data distribution and that it is not that easy to change.