Hacker News new | ask | show | jobs
by Seanny123 3353 days ago
If you'd like to engage with me further, how does a company know it needs Kubernetes? If I'm Soylent and I'm processing a few orders a minute, I'm probably safe with a few redundant monoliths. Do I have to be Uber? What's the middle-ground between Soylent and Uber that would still need this?

Is the answer the same as the question "who needs a microservice architecture"?

2 comments

There are a few killer features that you would benefit at any size and that I really love

* self healing: when you create a deployment/replica set. it will be maintained at all cost, so if the app has a memory leak or anything goes wrong, it will be contained and kept up and running

* Rolling update: even when you run 5 frontends, it is a pain to use capistrano or other tools to just update at git repo. it is literally a one liner in Kubernetes. If you use CI/CD the setup is just a few lines in any Jenkins/Gitlab/Travis...

* Service discovery: the combination of ENV and predictable DNS endpoints is just awesome

* Ecosystem: PaaS, Serverless... Many of the new world infra is built on K8s, so it is a door to the next gen, whether you know you will use it or not.

As for Micro Service Architecture, just starting with the web frontend and a couple of lightly dockerize middleware makes it sooooo simple that you instantly want to get more out of it.

As the overhead of running K8s vs. set of servers is relatively low especially at small scale, it is definitely worth looking at. Happy to do a run through with you and show you how the deployment of a tiered app works as a demo, ping me on @SaMnCo_23 if interested.

When running a kubernetes cluster on your own hardware what do you use for storage?
You have several options:

* Run Ceph in separate nodes and connect it to the cluster. With Juju, you can do that from the bundle, as Ceph is also a supported workloads. This gives you scale for storage

* Run Ceph within the cluster with a Helm chart. We see that for openstack-helm for example. Also gives you scale, but the lack of device discovery makes it less interesting in my opinion

* Run an NFS server, plain easy but not very scalable.

* Use hostpath, which is the default but doesn't get you scale.

So Ceph is the preferred storage provider? I've noticed there is a huge list, including GlusterFS. Do you have experience with any of the other ones?
Ceph is not a good solution at all for databases. Your performance is going to be terrible.

Ceph is object-based SDS solution designed to take servers with local drives and create a SAN out of them. In order to do this, they take each LUN (Ceph volume) and scatter the data across all nodes in the cluster. They do not assume that applications will run on these servers themselves... they assume compute is elsewhere, like a traditional SAN. The goal is to replace a SAN with servers, not create a converged platform. Also, Ceph was designed during an age where an Intel server did NOT have tier-1 capacity (8 - 20 TB), which is why they shard a volume across so many servers.

This causes a problem for modern applications like Cassandra, Mongo, Kafka etc, where they like to scaleout themselves and want a converged system, where data is not scattered, but on the node where an instance of that cluster runs. Ceph also disrupts (undo) the HA capabilities that these scaleout applications have (For example, a Cassandra instances data will not be on a node on which it thinks it is).

Do you happen to have any suggestions for alternatives to look into?
Gluster is a good alternative, Ceph can be prickly if you don't want a block device and instead just need a filesystem. CephFS fills sort of the same role, but does so on top of Ceph rbd.

Since GlusterFS /can use/ NFSv4 as a client, it should work with the stuff @samco_23 uses

Ah you are right, I forgot about glusterfs. My bad.

Canonical at this stage only supports Ceph commercially, but it doesn't mean GlusterFS is not a good option. I haven't tried it myself, so can't tell.

Anyone?

When you are running on your own hardware you usually have multiple options for storage:

- Use the local node storage although this is very simple but can get complicated on more complex installations

- Connect to your existing storage solution using ISCSI or NFS

- Running your own distributed storage solution on top of Kubernetes for Kubernetes. e.g. https://github.com/rook/rook

So, how well is Kubernetes suited for working with local hardware? As I understand it's mainly supposed to work with some external abstracted storage, like AWS EBS, cephs, NFS etc But it's much slower that local SSD and for some small local installation maybe be not optimal. Like, running some not large non-critical service which requires database, several workers, some monitoring etc, overall 3-4 local servers. Is Kubernetes a good fit for this or it's only supposed to work with hive-like stateless workers connected to external storage over network?
You have several options for this. If it is non HA, then you can pin a RC to a specific node, and use hostpath storage. if the container fails, it will always respawn on the same node, maximizing uptime and also having max capacity from your local SSD. Alternatively, you can also run rook, which is backed by Ceph, and use affinity to make sure that your pods are very close to storage, and gain back some of the speed.

In general and in my opinion, it is always better to run with k8s since you have, for the stateless pieces, cluster awareness. So there is never a downside to it, especially as the control plane is very lightweight for small clusters, and you can colocate many parts.

Thanks for the answer. How easy will it be to transfer this cluster to another set of servers (with data copy)? Like, stop the service for several minutes, push the button "Transfer" and start service on new servers after that. As I understand you'll need rook for something like that?
You can run distributed stateful workloads on kubernetes with local storage and disable rescheduling when the node goes down. This means that you need to manually migrate/restart workloads if a node goes down. There is work being done on improving handling local node storage https://github.com/kubernetes/community/pull/306 but it doesn't scale as well as network storage.

If you have an existing SAN solution you can connect to it via fiber channel over iSCSI.

In my opinion, if you are running containers in prod, you need to be on Kubernetes, regardless of scale.

Kubernetes is so much more than just "planet scale". It encourages patterns and mindsets for efficient software delivery that can really pay dividends.

Here are some of my favorite things:

Cloud agnostic. Your team and business are not at the mercy of pricing, features or availability of a third party. You can run it on everything from a massive cluster on AWS to some cheap mini computers off ebay: https://hackernoon.com/diy-kubernetes-cluster-with-x86-stick... Moving between cloud-providers when they both run Kubernetes is fairly trivial. You can also run on multiple clouds at the same time. Kubernetes abstracts the infrastructure away. It's also really easy to run a single node cluster on your own machine for local development. Try doing that with AWS services in a reliable way.

Immutable infrastructure:

The fact that containers don't hold state FORCES you to develop your applications in a 12-factor pattern. Deploy images by tag forces you to create a pipeline that automates their builds. It also allows you to effortlessly roll-back. It's not an afterthought or something you need to glue together.

High availability:

Just define how many replicas of your service you want and k8s does the rest. If they crash so what. Not only will they be restarted automatically but they will automatically be distributed across your fleet for you. Node goes down? Who cares. It's self-healing.

Service discovery:

Just put a k8s service in front of your application replicas and everything is automatic. Nothing to install, simply refer to the stable dns service name and everything will be routed. Software agnostic.

Config Management:

Very easy to inject secrets and configs as env vars or mounted into the pod. No third party library or framework needed to leverage it.

Dev - Stage - Prod envs made easy:

The same container image can move through each env effortlessly and you can be sure there is no "artifact rot"

Extensible and open

You can run different container runtimes such as rkt or different pod networks and persistent storage options. There is not a single company trying to steer it in some way. Also recently with helm charts it's becoming very easy to "apt-get install" whatever you want on your cluster. Very powerful and portable.

It does take some time getting ramped up but once it clicks there is no turning back.