| Okay, so from reading the developer docs at https://github.com/coreos/torus/Documentation, I'm inferring that: * Single reader/writer. If you try and set up more, Bad Things happen (even if it's single-writer, you don't get read-after-write consistency). * It sure sounds like network partitions will allow all kinds of badness. * If copy 1 goes down, you can keep operating on copy 2, then lose copy 2, have copy 1 come back up, and then warp back in time? Maybe that's prevented because of append-only and you just lose the data entirely because it was only replicating to one node due to the "temporary" failure. Most egregiously, the Architecture description of an Inode implies that persisting a write-to-disk requires persisting the "INode". INodes are persisted in etcd.
Which means your entire cluster's write-to-disk throughput is limited to what you can push through a Raft consensus algorithm. Look, there are all kinds of reasons one could legitimately decide that none of the existing scalable block storage systems satisfy your use case. Maybe containers really are different enough from VMs. But the blog post claims that there just aren't any solutions; the research papers cited in the Documentation page are mostly old and are about systems in a very different part of this sub-space; and what developer documentation exists does not encourage me that this is a good idea. Granted, I've been working on Ceph for 7 years and am a bit of a snob as a result. |
Good work on Ceph, by the way. I've been following your work since it was a PhD if I remember correctly.