| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tokenizerrr 3349 days ago
	When running a kubernetes cluster on your own hardware what do you use for storage?

2 comments

samnco 3349 days ago

You have several options:

* Run Ceph in separate nodes and connect it to the cluster. With Juju, you can do that from the bundle, as Ceph is also a supported workloads. This gives you scale for storage

* Run Ceph within the cluster with a Helm chart. We see that for openstack-helm for example. Also gives you scale, but the lack of device discovery makes it less interesting in my opinion

* Run an NFS server, plain easy but not very scalable.

* Use hostpath, which is the default but doesn't get you scale.

tokenizerrr 3349 days ago

So Ceph is the preferred storage provider? I've noticed there is a huge list, including GlusterFS. Do you have experience with any of the other ones?

ferrantim 3348 days ago

Ceph is not a good solution at all for databases. Your performance is going to be terrible.

Ceph is object-based SDS solution designed to take servers with local drives and create a SAN out of them. In order to do this, they take each LUN (Ceph volume) and scatter the data across all nodes in the cluster. They do not assume that applications will run on these servers themselves... they assume compute is elsewhere, like a traditional SAN. The goal is to replace a SAN with servers, not create a converged platform. Also, Ceph was designed during an age where an Intel server did NOT have tier-1 capacity (8 - 20 TB), which is why they shard a volume across so many servers.

This causes a problem for modern applications like Cassandra, Mongo, Kafka etc, where they like to scaleout themselves and want a converged system, where data is not scattered, but on the node where an instance of that cluster runs. Ceph also disrupts (undo) the HA capabilities that these scaleout applications have (For example, a Cassandra instances data will not be on a node on which it thinks it is).

tokenizerrr 3348 days ago

Do you happen to have any suggestions for alternatives to look into?

ferrantim 3344 days ago

You could look at Portworx (disclosure, I work there so am biased, but you can test it yourself for free)

marcoceppi 3349 days ago

Gluster is a good alternative, Ceph can be prickly if you don't want a block device and instead just need a filesystem. CephFS fills sort of the same role, but does so on top of Ceph rbd.

Since GlusterFS /can use/ NFSv4 as a client, it should work with the stuff @samco_23 uses

samnco 3349 days ago

Ah you are right, I forgot about glusterfs. My bad.

Canonical at this stage only supports Ceph commercially, but it doesn't mean GlusterFS is not a good option. I haven't tried it myself, so can't tell.

Anyone?

eicnix 3349 days ago

When you are running on your own hardware you usually have multiple options for storage:

- Use the local node storage although this is very simple but can get complicated on more complex installations

- Connect to your existing storage solution using ISCSI or NFS

- Running your own distributed storage solution on top of Kubernetes for Kubernetes. e.g. https://github.com/rook/rook

aquadrop 3349 days ago

So, how well is Kubernetes suited for working with local hardware? As I understand it's mainly supposed to work with some external abstracted storage, like AWS EBS, cephs, NFS etc But it's much slower that local SSD and for some small local installation maybe be not optimal. Like, running some not large non-critical service which requires database, several workers, some monitoring etc, overall 3-4 local servers. Is Kubernetes a good fit for this or it's only supposed to work with hive-like stateless workers connected to external storage over network?

samnco 3349 days ago

You have several options for this. If it is non HA, then you can pin a RC to a specific node, and use hostpath storage. if the container fails, it will always respawn on the same node, maximizing uptime and also having max capacity from your local SSD. Alternatively, you can also run rook, which is backed by Ceph, and use affinity to make sure that your pods are very close to storage, and gain back some of the speed.

In general and in my opinion, it is always better to run with k8s since you have, for the stateless pieces, cluster awareness. So there is never a downside to it, especially as the control plane is very lightweight for small clusters, and you can colocate many parts.

aquadrop 3349 days ago

Thanks for the answer. How easy will it be to transfer this cluster to another set of servers (with data copy)? Like, stop the service for several minutes, push the button "Transfer" and start service on new servers after that. As I understand you'll need rook for something like that?

samnco 3349 days ago

Typically, you would have a set of "helm charts" (packages) for your application(s). So deploying, without data, would be something like a sequence of "helm install app-appId --values /path/to/config/for/environment.yaml"

At this stage, you have no data. Normally, if you app depends on data, the pods will keep failing until data has been moved and is available (at which point they stabilize in an equilibrium state).

If you run Ceph in both environments, then you may want to use Ceph replication. If you use another storage layer, then it's also up to you to make sure the data is moved around.

rook should have support for that use case. However, it is still alpha as per the GitHub readme, so use at your own risk. However, you may want to consider the data replication problem outside of the scope of k8s.

If the 2 sets of clusters are physically close to each other, you may want to just point the new apps to the old data and pull the switch from the first one.

Another option would be to run another beta feature in K8s called Federation, which allows to manage several cluster via a single control plane.

Sorry for the long comment, your question has a broad scope, and it's hard to answer without diving into details.

aquadrop 3348 days ago

Thanks for your explanation. Looks like k8s is designed by Google with Google in mind - where data is somewhere over network and main problem is to manage code. I just thought it could be perfect solution to completely abstract and isolate such small services and move them around on local hardware with one push of a button, but it seems I'll have to think about external ways to manage data replication.

ferrantim 3348 days ago

You can use Portworx[1] to transfer data between servers and datacenters. It has a native kubernetes driver, is production ready being used by GE and Lufthansa airlines among others and does sync and async replication of data between environments. (disclosure I work for Portworx). [1] https://portworx.com/

aquadrop 3348 days ago

Yes, looks interesting, but seems enterprisey and not open source?

eicnix 3349 days ago

You can run distributed stateful workloads on kubernetes with local storage and disable rescheduling when the node goes down. This means that you need to manually migrate/restart workloads if a node goes down. There is work being done on improving handling local node storage https://github.com/kubernetes/community/pull/306 but it doesn't scale as well as network storage.

If you have an existing SAN solution you can connect to it via fiber channel over iSCSI.