| HN Mirror

That's not quite true, or at least not how it worked 5 years ago.

Jobs can write to local disk. Where do you think their executables came from? They can also use local disk as a scratch space for various things. The main use was writing out logs. Logs get written then picked up by a co-located job and saved out to disk clusters like Colossus, where they are then in turn picked up for processing.

However it's true that local disk was considered to be used only for transient data, most of the time. You could configure Borg to not work like that, so you could certainly run MySQL clusters on it and things, but that was a very unusual setup and not at all recommended. After all, then you'd have to manage backups and machine failure yourself.

This practice worked amazingly well when writing software entirely in house. So it was great for things like web search where there was no open source stack waiting to be adopted. It was disastrous for backwards compatibility with pre-written software, and I think this sort of thing contributed to Google's notoriously strong NIH syndrome. Using open source software at Google is hard; there are processes around it that must be satisfied, but more importantly, that software will expect POSIX file APIs to actually work and on Borg they don't. They appear to work, right up until your job gets evicted and then the data goes poof. To solve this you need to work with files using Google's own proprietary file VFS APIs that are backed by RPC clustered storage. GFS then Colossus presented a FS API that was only vaguely related to POSIX, so you couldn't even just hack together a quick bridge.

Net/net it was worth importing small in-memory libraries from the open source world, very rarely, but anything more substantial like a server - forget it. This could lead to absurd outcomes. I worked on a project there many years ago that would have really benefited from using Apache to do file serving, but Apache didn't integrate with Borg so I ended up using "static GWS" as it was called at the time. But static GWS had just enough features to serve websites written exactly how Googlers wrote it and nothing more, so it was a total fail at serving a third party developed website we'd agreed to host. Much pain.

One of the problems with Docker and container architectures in my mind is that they ultimately evolved out of Borg. The cgroups work in the kernel and other kernel features were developed by Paul Menage and others on the Borglet team for Google's internal use. Then the Docker guys picked them up and created this notion of containers that contain a whole OS - not at all how Google does it - but they kept the notion of transient local storage. Well that's a disaster, because normal software expects local disk to stick around. In my post-Google career I have witnessed more than one disaster caused by Docker of the form "whoops, we misconfigured our Dockerfiles and just erased our private key". There's no good justification for that. For most people systemd with some simple use of static linking would work just as well. That's what Borglets used to do - set up cgroups, have a simple base OS and then run statically linked binaries.