Hacker News new | ask | show | jobs
by wora 2711 days ago
I saw many comments about stateful workloads. I am not sure it is a necessary issue for cloud environment.

Within a zone or a cluster, the latency is about 1ms, which is faster than most hard disks. The network bandwidth is on par with disk throughput. What we really need is a faster database and a faster object storage that can match the network performance (1ms and 10Gbps), then all workloads can be stateless.

If one uses a VM on GCP, the VM has no local storage besides the small local SSDs. Practically even the VM is stateless besides some cache.

3 comments

> The network bandwidth is on par with disk throughput

Yes, and most storage you have access to, in cloud environments, is network attached. GCP disks, AWS EBS volumes, etc. All network and outside the hypervisors. You may have some local storage, but that's ephemeral, by design.

However, since we are talking about Kubernetes: not only VMs are ephemeral, but your containers are ephemeral too! And they can move around. So now you (or rather, K8s) need to figure out which worker node has the pod, and which storage is assigned to it, and then attach/detach accordingly.

This is what persistent volumes and persistent volume claims give you. They actually work fine already for StatefulSets.

Now, if you are in a cloud environment you should look into the possibility of using the hosted database offerings. If you can (even at a price premium), that's a great deal of complexity you are going to avoid.

With stateless services, forwarding requests to underlying storage and serialization can dominate resource consumption. After all, some services will do little besides fetch the right content and transform the data somehow.

Addressing this requires caching data in memory while making sure those caches are also disjoint so that you fully utilize your cluster memory. This has driven Google (and others) to make some services semi stateful and build dynamic sharing infrastructure to make this easier [1].

[1] https://ai.google/research/pubs/pub46921

From a database perspective, 1ms to disk is an eternity.

A good disk subsystem had less write latency than that in the early 90’s.

Write to disk has no practical latency because of write buffer, either local file system or remote database. Flush to disk would be slow unless you use SSD.

On the other hand, a single machine has limited reliability. If one wants to have high availability, they needs to dual write to another machine, which also has network latency.

I don’t think that’s true. I recall ~5ms seek times being top of the line.
he said 'subsystem' not 'disk'-

what was the latency to the controller with ram cache?

using seek time as a measure is also somewhat worst case - controllers/filesystems also queue(d) according to drive geometry.