Hacker News new | ask | show | jobs
by EdwardDiego 1902 days ago
> having a parallel consumer over a large partition spanning several GBs also requires tons of RAM because a segment must be loaded into memory.

I don't know too much about Kafka's internals, but that's my not experience of reading several terabytes of data from a Kafka topic. Memory didn't blow out, although we did burn through IOPs credits.

1 comments

Good remark. When there's one consumer or multiple consumers hanging on roughly to the same offset (using the same memory mapped segment). Having many consumers hanging on widely different offsets will cause many segments sitting in RAM.

Edit: Kafka apparently does not store a complete log segment in memory, only parts but having many consumers may lead to a lot of churn or a lot of memory consumed. Maybe this is getting better.