Hacker News new | ask | show | jobs
by susscrofa 1096 days ago
The Zanzibar paper has a section on the consistency model, which says that the race conditions outlined are solved by respecting update order. It then solves it by using Spanner as underlying storage (which is kind of lazy).

How does Warrant deal with consistency?

3 comments

You've highlighted a very important part of the paper. A lot of the external consistency guarantees provided by Zanzibar are facilitated by Spanner and its TrueTime mechanism. Warrant doesn't currently support/use Spanner. However, for the databases we do support (MySQL and Postgres - which are both ACID compliant), we've implemented the zookie protocol using the incrementing transaction ids they provide. This approach works for single writer deployments of these databases, so know that write throughput and overall availability will be lower. We started with this approach because most teams still use MySQL/Postgres. Warrant is built to support running on different types of databases, so we will be working on support for Spanner and other multi-writer distributed databases like Cockroach and Yugabyte in the future. I hope that helps.
The fact they did it that way is actually a perfect example of why Google is considered so far ahead of competitors technologically and operationally by their engineers. When you have a powerful building block like Spanner that engineers can use, they then can work on the product instead of wasting time on brittle consistency models, custom storage layers, and providing their own uptime guarantees.

This goes for every part of their stack. As a result, things like Colossus, BigTable, and Spanner effectively act like force multipliers for their engineers, because they provide the guarantees they can't get elsewhere. The fact other people at other random companies can't do that? Not their problem in the slightest, actually.

It's been many years, but a downside back when I worked there was infrastructure churn. Migrating off deprecated infrastructure meant you had to do a lot of work just to stay where you are. Mostly unstaffed products (like Google Reader, say) were at risk of going under due to technical debt.

When App Engine launched, that was great for me because I could write internal tools that were mostly off the treadmill. Unless you used one of App Engine's less-used API's (which themselves eventually got deprecated), your more obscure team-specific services could keep running.

So, lots of great technology is not necessarily great for productivity. I don't know what's happened since. I expected that launching Cloud would result in more mature infrastructure because external customers won't tolerate churn as much. I guess it's sort of true?

They updated the churn policy to require infrastructure teams to migrate their users, not just dump the work on them. That greatly eased the unfunded mandate load on product teams. That hasn't stopped infrastructure teams from making sweeping changes, though. There's one in particular happening now that's enormous -- to riff on the "changing the engines midflight" analogy, it's replacing the fuselage without anyone noticing.
Which one is that? Feel free to just hint or PM me.
I worked for a place that didnt value solid engineering in this way, and our systems where always so janky and half broken. I dont expect building Spanner, but being able to migrate certain huge tables at all would be nice. But no, they always balked at spending time building engineering tools. Even a solid job execution system would be cool, but no, janky it is.
I think that view might be a bit out of date.

Most of that stuff is available to external users in Google Cloud; so why isn’t Google Cloud more popular? I don’t have hard numbers handy, but it seems to me that GCP is behind both AWS and Azure in terms of dev mindshare.

GCP has plenty of great tools, but it can also be quite awkward to use, and it’s lacking some useful stuff like lightweight edge functions.

> but it seems to me that GCP is behind both AWS and Azure in terms of dev mindshare

AWS has 1st mover advantage and the biggest ecosystem.

Azure has Microsoft behind it = everyone that used Office, Sharepoint, SQL Server, C#, etc that wanted to move to the cloud.

Google doesn't have such a backing. Oracle cloud (huge growth) might be stronger in that sense.

I agree, but it’s surprising that things are that way, especially as they’re not short of cash. If they have this big tech edge over their competitors, where’s the benefit?
> If they have this big tech edge over their competitors, where’s the benefit?

Most of it is behind huge paywalls compared to their competitors. There's e.g. no serverless version of spanner that is pay-per-usage. Same with Big Table. Even the newly released AlloyDB has a huge starting cost.

If no 1 knows and no 1 can feel for the advantage it doesn't really help.

A lot of stuff is proprietary so adopting it as a third party can be much riskier. While Google Cloud does have a pretty solid depreciation policy it still means that you are locked to Google Cloud and whatever their future is. At least internal users can escalate if there are serious issues.
Ironically, by the time Spanner became generally available, Google had largely lost their appetite for launching new products.
Why is it lazy? Seems like leveraging a tool Google built for distributed systems specifically for consistency guarantees.
As I understood it from context, the word lazy was being used to complain that the reference to Spanner wasn’t in-lined.
Right. As someone who's not a systems guru, I would love some insight if/how the consistency guarantees can be achieved using common distributed database approaches.