Hacker News new | ask | show | jobs
by dehora 2239 days ago
Yes for sure, in terms of expansion/adoption. It's not so certain in terms of function/utility. As a Google outsider, gRPC really does seems like Stubby for the rest of us (with balancing left as tradeoff for the community). Kubernetes does not seem functionally at all, to be a Borg/Omega. it's more like a porcelain for running Heroku/12-factor/Nanoservice style workloads on top of a Borg-like (that's no small thing, but it is just a thing) after learning what Amazon learned out the gate on AWS, that developers will not be constrained on framework choices (ie they're not ready to settle on a PaaS).

To that extent, there's a hole left dealing with things that do need to consider state versus run networked API services. Kubernetes seems to have no good story here, whether it's the evolutionary progress happening around StatefulSets/PVC/PV, or a per appliance operator for you, and you, and you, which punch a hole as big as you like in the scheduling abstraction. Streaming for example is a notable pain, but pretty much every OSS project created in this century that has state needs a compensating tool, typically an operator, to function on Kubernetes. I'm not even sure at this point whether state is a design consideration that can be retrofitted—that's not a criticism of Kubernetes, but it is a complication and investment factor for stateful workloads. So what may happen should Kubernetes be one of those things that does in fact end up being a long term technology, is the entire software industry offloads state management to vendors (ie to a handful of cloud services), or something in open source reacts and is created to fill the infrastructure gap for state management.

1 comments

I don't think you can count on Google solving stateful for k8s, because within Google all storage devices and data thereupon are, to a fair approximation, totally disposable. There is nothing at Google considered a "stateful service" the way k8s community members mean it, e.g. a mysql server with critical local files.

In my personal opinion it is more valuable to adopt the Google model where no local file is considered critical, than it is to try to cram statefulness into an otherwise cloud-native stack. I feel that if you still care about specific files on specific disks then you really haven't fully adopted the meaning of cloud-native.

I agree with pretty much all of this. The point I'm making and not conveying well is that the state eventually has to be held somewhere. So not that it has to colocate with application services, but that if you want to store something with something, then Kubernetes as the substrate makes things harder. Even where you have say replication built in and are not reliant on 'the' file or disk, you tend to need to bolt on an operator to handle replication/placement. An example is Etcd used within Kubernetes. You don't get to just run Etcd on K8s itself; it needs an operator which was about 9KLOC last time I looked.
How Google handles disk storage? From what I remember of the papers, GFS depends on a disk service, that exposes the disk resources. And how YouTube store MySQL data? Uses GFS or some other mechanism?
Not worked at Google, but from what I read, only GFS or Colossus worry about disk storage, every other application can write only to GFS and have no dependency on local disks.
That's not quite true, or at least not how it worked 5 years ago.

Jobs can write to local disk. Where do you think their executables came from? They can also use local disk as a scratch space for various things. The main use was writing out logs. Logs get written then picked up by a co-located job and saved out to disk clusters like Colossus, where they are then in turn picked up for processing.

However it's true that local disk was considered to be used only for transient data, most of the time. You could configure Borg to not work like that, so you could certainly run MySQL clusters on it and things, but that was a very unusual setup and not at all recommended. After all, then you'd have to manage backups and machine failure yourself.

This practice worked amazingly well when writing software entirely in house. So it was great for things like web search where there was no open source stack waiting to be adopted. It was disastrous for backwards compatibility with pre-written software, and I think this sort of thing contributed to Google's notoriously strong NIH syndrome. Using open source software at Google is hard; there are processes around it that must be satisfied, but more importantly, that software will expect POSIX file APIs to actually work and on Borg they don't. They appear to work, right up until your job gets evicted and then the data goes poof. To solve this you need to work with files using Google's own proprietary file VFS APIs that are backed by RPC clustered storage. GFS then Colossus presented a FS API that was only vaguely related to POSIX, so you couldn't even just hack together a quick bridge.

Net/net it was worth importing small in-memory libraries from the open source world, very rarely, but anything more substantial like a server - forget it. This could lead to absurd outcomes. I worked on a project there many years ago that would have really benefited from using Apache to do file serving, but Apache didn't integrate with Borg so I ended up using "static GWS" as it was called at the time. But static GWS had just enough features to serve websites written exactly how Googlers wrote it and nothing more, so it was a total fail at serving a third party developed website we'd agreed to host. Much pain.

One of the problems with Docker and container architectures in my mind is that they ultimately evolved out of Borg. The cgroups work in the kernel and other kernel features were developed by Paul Menage and others on the Borglet team for Google's internal use. Then the Docker guys picked them up and created this notion of containers that contain a whole OS - not at all how Google does it - but they kept the notion of transient local storage. Well that's a disaster, because normal software expects local disk to stick around. In my post-Google career I have witnessed more than one disaster caused by Docker of the form "whoops, we misconfigured our Dockerfiles and just erased our private key". There's no good justification for that. For most people systemd with some simple use of static linking would work just as well. That's what Borglets used to do - set up cgroups, have a simple base OS and then run statically linked binaries.

> local disk. Where do you think their executables came from?

Just FYI, a lot changes in 5 years.

how is it different now? there is no local disk? is it more like a serverless appr oach?