|
|
|
|
|
by scaleout1
2790 days ago
|
|
Hey man, thats pretty cool and we do exactly the same using Cassandra instead of FDB. Since Cassandra doesnt support transaction at high volume (100K tps) we do a shuffle so that all the same key do read/modify/write from the same machine. It seems like with FDB you can get away with it as it supports transactions? My question to you is what is the volume your system is operating at? Also how does it work for skews? Lets say you need to update HLL for a key that is heavily skewed, does your FDB transaction unwind fast enough not to slow down the whole system? |
|
> what is the volume your system is operating at?
This varies, as our workload is dynamic in that anyone at any time can inject a query for the data stream, but for this sake lets say 5k.
> Also how does it work for skews?
Foundation does a magnificent job automatically detecting and physically relocating skew. However, to mitigate write skew, I use time bucketing techniques where party of the key is a MURMUR3 hash of the minute_of_hour so that heavy write loads can only affect a server for one minute. This has helped with certain metrics.
> Lets say you need to update HLL for a key that is heavily skewed, does your FDB transaction unwind fast enough not to slow down the whole system?
There isn't really a concept of an HLL (or key) being heavily skewed. A key lives on a single sever (or multiple, depending on replication). Essentially, when I want to merge additional HLL content into one already store, I just read it, deserialize it, merge it with the one I have and then write the result back to FDB. Because of transactions I can ensure that nobody else is doing the same exact thing I am doing. If there were...then mine (or their) transaction would fail, and retry. The retry is important because it would reattempt the same logic, except the result I got from the database would be the merged result from somebody else. This allows you to ensure that idempotent / atomic operations happen as you'd expect.