Hacker News new | ask | show | jobs
by jen20 1940 days ago
I spent the first months 2020 building out a database-as-a-service offering that runs in AWS, Azure and GCP (think Cockroach Cloud or MongoDb Atlas model, but for a different database).

That was an instructive project - building the same service in three clouds tells you a lot both about:

- The quality and completeness of foundational services (identity, networking, compute, storage)

- The tooling ecosystem (the quality of the Packer builders and Terraform providers [1] in our case)

- How helpful (or existent) support is, which ranged from an account manager telling us up-front “here’s the way to avoid hitting limits for your design” to not being able to talk to a human at all throughout the entire project, and thus having to phase in beta customer onboarding for that cloud because of the arbitrary limits.

At some point that team should write a full retrospective on this.

[1]: Disclaimer - I have worked on both Packer and Terraform in the past at HashiCorp.

1 comments

So, what are your experiences?
These are personal opinions, based on the project I outlined (and not what I work on now, necessarily!).

Technically:

- Google has the most reliable network, compute and storage (for a given size).

- AWS has the only comprehensible security model for identity, although it's still not complete (e.g. I can't grant a role assigned to an instance profile permission to `DescribeInstances` for itself only). I strongly believe IAM is the crown jewel of AWS, - but wish it would be completed to it's own potential.

- Google has the best "organisations" structure overall, though AWS Organisations is vastly improved over what it used to be.

- Azure's model for network peering between networks in different tenants is complete crazy town and will certainly result in outages when a customer disables the service account required to maintain it.

- Provisioning times in Azure are wildly variable - provisioning a VM with the same image in the same zone often had minutes of difference between fastest and slowest. The other two are much more consistent.

- The Terraform provider for Google is missing many data sources, and almost every type of resource we used needed patching in some important way.

- We had to build "surrogate" Packer builders for Google and Azure to make automation of scratch-build ZFS-on-root Ubuntu images with our platform customisations. I built the AWS version of that builder originally, so that was not much of a surprise.

From a support perspective:

- AWS and Azure were very willing to work with us in getting our service up and running even though we weren't spending a huge amount in the development phase, and it was easy to get in touch with someone to explain what we were doing and request advice.

- It was impossible to speak to a human at Google. Experiencing the kafkaesque automated account policies (e.g. "you can have enough cores to actually bring up a database cluster when you pay your invoice but we haven't issued an invoice yet because your account isn't a month old, and no despite being a company with a multi-year trading history you can't just put money on deposit to prove trustworthiness") actually prompted me to move my personal accounts off GSuite in case a problem ever arose.

Sadly the technical excellence of GCP in several important areas did not (for me) make up for the fact that they are impossible to work as a small business doing something that is not strictly happy path.

What about the "world's unambiguous IaaS champion" (https://github.com/pulumi/pulumi/issues/6446) Oracle Cloud??
That issue serves to prove that the line between "highly skilled troll" and "exceptionally earnest individual" is very fine!