I'm wondering something: how is the storage/compactation solved? AFAIK S3 lacks append semantics, so data must be accumulated somewhere else before storing it. Kinesis?
We use a local disk to temporarily stage data before putting it on S3. We have smaller WAL (write ahead log) objects, and a periodic compaction process which creates read-optimized files on S3.