| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by AhmedSoliman 2879 days ago
	It's a very different architecture and design. You can head to https://logdevice.io/docs/Concepts.html to learn more about how LogDevice works. In terms of function. LogDevice is similar to the core of Apache Kafka.

2 comments

majidazimi 2879 days ago

True, but Kafka has two very annoying features built into it:

- There is no many-to-many log recovery whereas -- for example in Pulsar/DistributedLog -- logs are stored in small segments and distributed to multiple nodes.

- Read scalability. Since all the log is stored in one node (with some replicas) the readers are bound to single disk sequential read capacity. Again Pulsar stores logs in segments that are distributed among broker nodes which helps a lot when there are many readers.

link

sh00s 2878 days ago

LogDevice has many-to-many rebuilding as well, and typically data for a log (similar to partition in Kafka) is spread relatively uniformly over the (potentially large, much bigger than replication factor) set of shards that hold data for that log.

link

LgWoodenBadger 2878 days ago

I'm not sure how accurate your comment is regarding Kafka's annoying features given that Kafka has partitions, which "solve" all of the problems you stated.

link

majidazimi 2878 days ago

No it doens't, since a single partition is stored sequentially on one disk which limits the consumers to bandwidth of single disk (say c1 reads beginning of the partition and c2 end of the partition). But in the case of Pulsar c1 is most probably connected to a different node than c2.

link

adrienconrath 2878 days ago

LogDevice has this concept of "node set", which is the set of storage nodes that can be selected by the sequencer as recipients for a record or a block of records. A typical node set size is around 20-30 in our deployments. Each storage node in the node set contains a subset of the records (or blocks of records) of the log, we call that subset a log strand. The amount of IO capacity available to append records to a log or read records from a log scales with the size of the node set.

All of this is done while preserving the total ordering guarantee thanks the separation of sequencing and storage.

The operator could for example set a bigger node set size for logs that are known to have multiple consumers and require more IO capacity.

At facebook, we have use cases where a single consumer will need to replay a backlog of records in a log, sometimes hours or days worth of data to rebuild its state. We call this a backfill. Node sets allow the IO to be spread across multiple disks which improves backfill speed and helps reduce hotspots.

-- Adrien from the LogDevice team.

link

sh00s 2878 days ago

splitting data into partitions would mean there's no total order on that data anymore, right?

link

LgWoodenBadger 2878 days ago

Kafka only guarantees a total order within a given partition, not across them.

link

Dowwie 2879 days ago

Does it require at least one fully dedicated FTE to set up, maintain, and use correctly?

Curious where logdevice would be a bad decision..

link