Hacker News new | ask | show | jobs
by thetrooperer 2453 days ago
That's a good question. There are multiple reasons for this. I'll briefly mention two of them. One is the high fan in ratio - millions of machines are writing relatively small blobs of data, so the middle layer serves as an aggregator (which saves backend's IOPs, number of connections, etc). Another reason is the volume of metadata - it would be inefficient to keep all the LogDevice-level metadata on each of the producer hosts.
1 comments

Will the WriteService(Aggregator) make sense for environments having thousands of machines(not in millions) and they are all within the DataCenter. In our company, we are moving away from this design of having aggregators, to directly writing to Storage whereever possible, as it reduces the message loss.

On the volume of metadata held by Producers, will there be any significant difference between holding WriteService & LogDevice meta.

The devil is in the details probably, but if you have a single datacenter and all writes are coming from thousands of machines ("edge"), yes, it may make more sense to set up a single LogDevice / Kafka cluster and have all the edge hosts write to it directly.