Millions of separate topics on a single Kafka cluster? The way it's designed requires opening files for all of those topics and their partitions so good luck if you're trying that. You'll run out of file handles, then memory, and then the disk access will completely freeze up.
I didn't think we were speaking of millions of topics here; only millions of logs. You can certainly have logs numbering in the millions using a single topic. Mux/demux would have to happen at the producer/consumer side, of course.
Do you mean log segments then? In that case I don't see what's special about it because that's just rolling files and all of these systems can handle millions that way.
As far as millions of topics, if you have to do it at a logical layer yourself, then you might as well use a system that supports it natively.
It does not, I've lost alot of time profiling Kafka perf issues against clusters on the exact same hardware with exact same traffic but with a 3000% throughput difference. The root cause was one cluster had a lot of empty test topics
Try benchmarking Kafka from 0 partitions to a few thousand partitions in 100 partition increments. The benchmark only needs to write to a single topic, using their provided producer perf tool while all other topics are inactive with zero data.
As the partitions increase there is a very noticeable drop in throughout that looks to be linear.
Kafka does not handle a large number of partitions well currently, large even being low thousands. It's easy to hit with just a few hundred topics.
Reading between the lines ehen Linkdin and Netflix advertise several clusters, i am predicting/guessing they shard the data.
I didn't think we were speaking of millions of topics or partitions here; only millions of logs. You can certainly have logs numbering in the millions using a single topic. Mux/demux would have to happen at the producer/consumer side, of course.