|
|
|
|
|
by theikkila
1044 days ago
|
|
Related to 1. If I understood corrently the agent generates single object per each flushing interval containing all data accross all topics it has received. Does this mean that when reading the consumer needs to read multiple partition data simultaneously to access just single partition? How about scaling consumers horizontally how does WarpStream Agent handle horizontal partitioning of the stream from consuming side? |
|
That is correct about flushing. RE: consuming. The TLDR; is that the agents in an availability zone cluster with each other to form a distributed file cache such that no matter how many consumers you attach to a topic, you will almost never pay for more than 1 GET request per 4MiB of data, per zone. Basically when a consumer fetches a block of data for a single partition, that will trigger an "over read" of up to 4MiB of data that is then cached for subsequent requests. This cache is "smart" and will deduplicate all concurrent requests for the same 4MiB blocks across all agents within an AZ.
It's a bit difficult to explain succinctly in an HN comment, but hopefully that helps.