Hacker News new | ask | show | jobs
by bacheaul 1482 days ago
There's also the file system cache to consider, which Kafka famously leans on heavily for IO performance. If the majority of your consumers are reading the latest messages which were just written, these will likely come from the cache which is in memory. A consumer reading from the earliest messages on a large topic could conceivably cause changes to what's available from file system cache for other consumers reading from the latest messages, so they're not necessarily totally isolated. I've not taken measurements of this though to say it's an actual issue, just saying that I wouldn't dismiss it.
2 comments

FS cache is, at least for me, included in "i/o resources". But this utilization will occur for any consumer reading any partition from any topic from anywhere other than the tail segment which isn't currently being read (or even the tail segment of a partition not being produced); it's not specific to partition 0. And I don't believe you'd gain anything by turning on a mirroring cluster rather than increasing the number of brokers in the same cluster; in both cases you're solving it by spreading the i/o load out more.
Kafka special cases the "tail" of a partition - the open log segment (i.e., the very end of the partition that's still being written too) is never evicted, and log segments closer to the tail are evicted last, IIRC.

It definitely prioritises tail consumption over read from 0.