Hacker News new | ask | show | jobs
by sergei 4895 days ago
1) Both joins and aggregates are processed in parallel. In short, the larger your cluster, the faster we evaluate a complex query because we can bring more hardware to the problem.

You can get a better idea of how we scale distributed joins from one of my blog posts:

http://www.clustrix.com/blog/bid/242790/Scaling-Distributed-...

Once we complete the join, we will also perform the aggregate SUM operation across multiple nodes, getting individual sums, and then forwarding the results to another node for a final tally.

As a rule of thumb, each node has the throughput of about 70% of MySQL on the same hardware. So if you have a complex query that can take advantage of the cluster (100k rows per table examined qualifies), then you will see a 10x speedup on a 16 node cluster over a single instance.

However, that assumes that your entire data set fits in cache on the single instance box. As soon as that's no longer true, it's more likely that you will see a 100x or more speedup from Clustrix because you will have way more memory available for the buffer cache.

For example, if you have a 32GB single instance MySQL server, but you have more than 64GB of working set size, you're going to be limited by IO.

On Clustrix, if you move this to say a 4 node cluster, you are going to have an effective cache footprint of 128GB. So Clustrix will operate entirely out of cache.

2) We're fully ACID compliant using a two phase commit w/ PAXOS for transaction resolution. Because we're MVCC, readers never have to take out locks. To coordinate WRITE access to the same data, we have a row level distributed lock manager. That means that the entire cluster participates in lock management.

So yes, this has a higher communication overhead over a single server. If you have a small write workload with very little concurrency, you will see better query latencies from a single instance database.

However, as soon as you start getting to more serious write workloads with higher concurrency, Clustrix will distribute your writes across several machines in the cluster. We can handle an aggregate write throughput you can't touch with a single instance database, even with a fancy/expensive storage array.

One of our customers, for example, has a peak throughput of 70,000 transactions/sec. They're about 90/10 read/write, so that breaks down to 10,000 TPS for writes. And they are really fond of multi-row updates, insert .. ignore, insert .. on dup update kind of queries. So it's not exactly a lightweight key-value application. Prior to Clustrix, their MySQL installation peaked out at 5,500 TPS total (read+write). They now have a 15 node Clustrix cluster.

It's true, you can always construct a workload that will not distribute well. But these are generally rare degenerate cases that stem from poor application design, and would perform equally poorly on a single instance database.

2 comments

Very interesting. Thank you!
How do you handle distributed joins with group by's? Are there certain SQL syntaxes which aren't handled well by queries? What are the 90th percentile response times?