|
|
|
|
|
by ZGF4
3268 days ago
|
|
There are a few misguided views in this article and in some of these comments. 1. Every shardable database (Cassandra, Dynamo, BigTable) has to worry about hot spots. Picking a UUID as a partition key is only step one. What happens if one user is a huge majority of your traffic? All of their reads/writes are going to a single partition and of course you are going to suffer from performance issues from that hot spot. It becomes important to further break down your partition into synthetic shards or break up your data by time (only keep a day of data per shard). BigTable does not innately solve this, they may deal better with a large partition but it will inevitably become a problem. 2. Some people are criticizing the choice of NoSQL citing the data size. Note you can have a small data size but have huge write traffic. An unsharded RDBMS will not scale well to this since you cannot distribute the writes across multiple nodes. Don't assume just because someone has a small data set they don't need to use NoSQL to deal with their volume |
|
Yeah, but the issue with DynamoDB seems to be bursts of access triggering "throughput exceptions" caused a very static bandwidth allocation which is going down with the number of shards and not so graceful handling of overload situations.
It is imho. an anti-pattern to split up the bandwidth like they do. It negates the multiplexing gain for no good reason except a rigid control model.