Front door, firewalls, LB’s, AKS, Azure Functions, Azure Disk (in AKS), scale sets and file-share storage accounts.
All of it was painful. There docs were lacking or sent you in circles. The documentation for their firewall product is three-quarters known issues and errors.
Azure functions were painful to run locally, good forbid you’re not using Windows.
Azure would take ages to attach a node on K8’s. Like, over an hour. It consistently had issues moving Azure disks between nodes in K8s: “can’t mount disk, attached to another host” in comparison AWS will speedily and happily re-attach an EBS disk to a new machine.
Permissions were opaque and distributed across the whole interface.
It silently deprecated keys underneath us, broke a number of services (couldn’t write to attached disks in K8s, couldn’t move them), didn’t inform us that this happened, we only figured it out by trawling through GitHub issues.
Storage Account explorer application breaks/stops consistently.
“Alert but permit” mode on firewalls doesn’t do what it’s supposed to: it will permit, but totally fails to alert you.
Scale sets operate weirdly, I didn’t personally deal with this too much, but my teammates had consistent issues with strange caching issues and more or less machines being spun up than should have.
Until we fixed it, every Azure PoP was health-checking our web app 2-3 times a minute: our logs and servers were being flooded with literally thousands of pointless requests.
If you have an AKS cluster with n machines currently in it, with a minimum and maximum of (n, m) machines, and you want to say, increase the minimum number, you cannot: it will refuse and tell you “the minimum number of nodes must include the current number”, so rather than just automatically adding a new node (a la AWS, and I presume GCP), you have to force the cluster to scale up to the new number of machines by throwing workload at it, then make the change.
AWS has a single Python package called “Boto3”, from which you can do pretty much everything. Microsoft in their infinite wisdom has a separate Python package for every service, and sometimes subset of service. Do you know whether you need the package for Storage Accounts, File Share, Share Accounts, Object Store or whatever else they had? Also, authenticating against these was a pain: sometimes you need a key provided by the service (let’s hope your permissions let you see that), sometimes you need to generate a service principal for your app (unless there is already one? In which case it’s listed in the UI, but nowhere you’ll find it, and certainly not under “service principals”, and you probably won’t have the permissions to see the information you need anyways) and then sometimes you need both!
Azure let us spin up a K8s cluster on a version of 1.18, but then didn’t let us scale the cluster a few weeks later, because apparently that version just didn’t exist, so we should either use 1.17, or update to a newer version of 1.18, but you can’t skip point-releases, so you’re going to have to update everything in your cluster before you can have another node.
All of it was painful. There docs were lacking or sent you in circles. The documentation for their firewall product is three-quarters known issues and errors. Azure functions were painful to run locally, good forbid you’re not using Windows. Azure would take ages to attach a node on K8’s. Like, over an hour. It consistently had issues moving Azure disks between nodes in K8s: “can’t mount disk, attached to another host” in comparison AWS will speedily and happily re-attach an EBS disk to a new machine.
Permissions were opaque and distributed across the whole interface.
It silently deprecated keys underneath us, broke a number of services (couldn’t write to attached disks in K8s, couldn’t move them), didn’t inform us that this happened, we only figured it out by trawling through GitHub issues.
Storage Account explorer application breaks/stops consistently.
“Alert but permit” mode on firewalls doesn’t do what it’s supposed to: it will permit, but totally fails to alert you.
Scale sets operate weirdly, I didn’t personally deal with this too much, but my teammates had consistent issues with strange caching issues and more or less machines being spun up than should have.
Until we fixed it, every Azure PoP was health-checking our web app 2-3 times a minute: our logs and servers were being flooded with literally thousands of pointless requests.
If you have an AKS cluster with n machines currently in it, with a minimum and maximum of (n, m) machines, and you want to say, increase the minimum number, you cannot: it will refuse and tell you “the minimum number of nodes must include the current number”, so rather than just automatically adding a new node (a la AWS, and I presume GCP), you have to force the cluster to scale up to the new number of machines by throwing workload at it, then make the change.
AWS has a single Python package called “Boto3”, from which you can do pretty much everything. Microsoft in their infinite wisdom has a separate Python package for every service, and sometimes subset of service. Do you know whether you need the package for Storage Accounts, File Share, Share Accounts, Object Store or whatever else they had? Also, authenticating against these was a pain: sometimes you need a key provided by the service (let’s hope your permissions let you see that), sometimes you need to generate a service principal for your app (unless there is already one? In which case it’s listed in the UI, but nowhere you’ll find it, and certainly not under “service principals”, and you probably won’t have the permissions to see the information you need anyways) and then sometimes you need both!
Azure let us spin up a K8s cluster on a version of 1.18, but then didn’t let us scale the cluster a few weeks later, because apparently that version just didn’t exist, so we should either use 1.17, or update to a newer version of 1.18, but you can’t skip point-releases, so you’re going to have to update everything in your cluster before you can have another node.