|
|
|
|
|
by blorgle
3463 days ago
|
|
This doesn't really make much sense to me. If your systems support software level replication (Elasticsearch, Cassandra, MySQL, MongoDB all do) then why do you need persistent storage? You just need container scheduling anti-affinity and enough replicas. You only need persistent storage for systems which don't support that replication. Ceph can certainly be deployed as performant for DB workloads. You say "Cinder has the stench of OpenStack" but Cinder is just a Python based webapp which povides an API to arbitrary storage backends (Ceph RBD, iSCSI, NetApp ONTAP, whatever). How can it be "better now"? It doesn't provide storage on its own. If your ops team was using the default "proof of concept" LVM backend then I could see how you might get a bad impression but that just means your ops team doesn't know much about OpenStack. Am I missing something obvious? |
|
First off, the replication thing. It is true that ES, C*, and Mongo replicate within their cluster mostly automatically. However, this is not without cost. It takes non-trivial amounts of network capacity, disk I/O, and CPU cycles to migrate shards from a failed (or downed) node to a newly stood-up node. Often, many GBs must be moved and for something like ES, where shard replicas reside on many different nodes, that means much of your cluster feels the impact of this. The cluster can heal, but healing isn't easy.
Why would a cluster node go down? It's not always hardware failure. CoreOS regularly self-updates and reboots itself without intervention. In a Kubernetes cluster, this is a non-event because pods are simply rescheduled elsewhere the the degradation is momentary. If we were talking about 300 GB of persistent data, though, that's a serious amount of data that will get reshuffled every time there is a node reboot, especially when you consider that an Elasticsearch cluster may span dozens of physical nodes and experience dozens of node reboots in the course of a normal day. Maybe we could hack something that would disable shard reallocation in ES (there's a setting for this) when scheduled reboots happen but that's pretty hacky. Besides, ES is just one of a number of different datastores in use at my workplace.
As for Cinder, it's reliant on OpenStack APIs which (at least as of Juno) are reliant on things like RabbitMQ. We've seen a number of OpenStack failures due to RabbitMQ partitioning and split-brained scenarios. We're also back to the disk-on-network problem again: SCSI backplane ---ethernet---> client will never be as fast as local disk.