|
|
|
|
|
by ryanworl
2966 days ago
|
|
One potential problem is a Kafka partition’s size is limited to the size of the smallest machine in the replica set. This means if you want infinite retention you have to potentially over-partition so they never get too big, keep buying bigger machines and disks, or deal with a repartition of all data. An simple way to get around this problem is dumping messages into a file and putting that file in S3 named something like “topic-partition-offset” where offset is the offset of the first message contained within that file. You can then read those forward starting from offset zero and go until you reach the end, then start reading from Kafka for recent data. The drawback is this isn’t integrated with Kafka so you’re now maintaining what is effectively two different systems for the same data. It also means the key-based compaction won’t work either and you’d have to re-implement that on top of the files in S3 as well. |
|
Growing LVM with XFSs has worked well, 0 downtime and around 60 seconds.
Allows you to over provision just enough you do not have to babysit the drives or pay $$$ for unused disc.
If you stripe the volumes you'll also distribute your IOPs in AWS.
Outside AWS LVM still applies. Kafka's JBOD is useless without easy / auto rebalancing.
This week onsite at a client's I discovered ScaleIO which can present up to a 1PB volume and does clever sharding/replication in the background.