|
|
|
|
|
by twhitmore
1169 days ago
|
|
Haven't read the full paper, but several aspects of this sound less than fully reliable. It reminds me specifically of 90's multi-client LAN database systems (dBase, Clipper) where clients coordinated via file locks. Unreliability & hangs became a big problem for us. In the summarized RDMA database, I'd be pretty concerned about reliability & integrity: 1) Crashed servers will leave records locked, and the system will hang.
2) Question whether lock timeouts can be adjudicated reliably.
3) Any errors in server behaviour can easily & widely corrupt data across any other nodes.
4) Overall the RDMA coordination makes me cautious. Can we really replace Paxos with RDMA reliably? If not, problems squeeze out elsewhere.
5) Proposed single-threaded recovery procedure sounds a hazardous operational bottleneck.
6) I'm also cautious about coordination requirements around recovery/ or to transact knowing that recovery is not in process, unless we can show that can be reliable & not add cost to the protocol. |
|