Does it block reads until sealing is complete? How many nodes in the nodeset have to respond before sealing is complete? [NodeSet] - [ReplicationFactor] + 1?
Yes, reads are not released (i.e. are blocked) until sealing is complete. We call the minimal set of nodes sufficient to serve reads for a log (the same set is needed for sealing to complete) an f-majority.
For a simple case where placement of data is location-agnostic, indeed the definition of f-majority is n - r + 1, where n is the nodeset size, and r is the replication factor.
However, if your replication property, is say, "place 3 copies across 3 racks", then the definition of f-majority becomes more complicated - e.g. having all nodes in the nodeset respond minus two racks will also satisfy it.
> Yes, reads are not released (i.e. are blocked) until sealing is complete.
Which are the cases where consistency is compromised then? If a client of the log needs consistency, it needs to ensure that it has seen all previous updates to a log before making a new update, which implies a read.
> However, if your replication property, is say, "place 3 copies across 3 racks", then the definition of f-majority becomes more complicated - e.g. having all nodes in the nodeset respond minus two racks will also satisfy it.
Sure, the aim being that no write can be successfully acknowledged by enough replicas to complete the write.
> Which are the cases where consistency is compromised then? If a client of the log needs consistency, it needs to ensure that it has seen all previous updates to a log before making a new update, which implies a read.
Consistency in a more general sense than just read-modify-write consistency. If you have sequencers active in several epochs at the same time accepting writes, the records may end up being written out of order, and there would be a breakage of the total ordering guarantee.
> Consistency in a more general sense than just read-modify-write consistency. If you have sequencers active in several epochs at the same time accepting writes, the records may end up being written out of order, and there would be a breakage of the total ordering guarantee.
But given that reads are blocked on all sequencers before the current one, this should still provide total order atomic broadcast, unless a single client can connect to a sequencer with a lower epoch than one it has already seen.
LogDevice clients do notify sequencers if they have seen newer epochs, which would cause a sequencer reactivation, which indeed resolves the issue within the context of a single client.
However, there can still be reordering in the context of a wider system. E.g. if client A sends a write (w1) to sequencer in epoch X, which gets replicated and acknowledged, and after that client B sends a write (w2) to sequencer in epoch (X-1) which gets replicated and acknowledged (because epoch X-1 is not sealed), then readers eventually will see w2 before w1. If writes in epoch X weren't accepted before the sealing of the epoch (X-1) had completed, this reordering would be impossible, however as a result write availability would suffer.
Ok, but for this to be problematic, readers would need to have some other mechanism to know that w1 did actually take place before w2. So FIFO instead of total order.
Anyhow, thanks for answering my questions. Very interesting system.
Ah, I think you may be talking about the repeatable reads property? All readers in LogDevice are guaranteed to see the same records in the same order (aside from trimmed data).
What I was wondering really, was whether LogDevice provides total order atomic broadcast, and as such whether it solves concensus. It appears it does (or rather, it daisychains on the concensus provided by zookeeper and uses it's own fencing mechanism, similar to what bookkeeper/Pulsar does).
For a simple case where placement of data is location-agnostic, indeed the definition of f-majority is n - r + 1, where n is the nodeset size, and r is the replication factor.
However, if your replication property, is say, "place 3 copies across 3 racks", then the definition of f-majority becomes more complicated - e.g. having all nodes in the nodeset respond minus two racks will also satisfy it.