Hacker News new | ask | show | jobs
by gwynforthewyn 358 days ago
I’ve worked with Azure for a few years now, AWS and classic data centres for 15 years before that.

It’s pretty clear if you check github that Azure’s services and documentation are written by distributed teams with little coordination. We have a saying in-house that the info is all in their docs, but the sentences and paragraphs for even trivial things are split across ten or fifteen articles.

I see a problem like granting */read in an innocuously named role and am left wondering if it was pragmatism, because figuring out least privilege was tough, or a junior who didn’t know better and was just trying to make progress.

I’m on a phone and can’t search git effectively, but I’d swear there was a comment or note on the golang implementation of msal saying that it used non-idiomatic go with no real support for many of the auth flows in v1 because it was written by an enthusiastic junior dev and released with little review. The modern version looks better, but I felt like I got a window into Azure back when I read that.

Building large services is hard, my hat is off that Microsoft is making it work, but sometimes we get to see that it’s just teams of developers doing it for them and those teams look a lot like the teams we work with every day. There’s no secret sauce, except that MS has the resources to iterate until the thing mostly works most of the time.

5 comments

> It’s pretty clear if you check github that Azure’s services and documentation are written by distributed teams with little coordination.

I've come to the same conclusion after dealing (and reporting) jankyness in both the Azure (ARM) API and especially the CLI. [0] is a nice issue I look at every once in a while. I think an installed az cli is now 700 MB+ of Python code and different bundled python versions...

[0]: https://github.com/Azure/azure-cli/issues/7387

Why do all these use Python? AWS, GCP, Azure, all three CLIs use Python; they're slow, bloated, heavy to install... what advantage does Python really offer here? You can't in any sensible way rely on it being installed (in your linked issue we see that they actually bundle it) so it's not even an 'easy' runtime.
Python takes up less than 16 MB on disk (python3.11-minimal + libpython3.11-stdlib on Debian) so whatever Microsoft did to make their Azure CLI package take up almost 700 MB, I don't think the language is the problem.
They bundle versioned API schemas....looooots of them.

It would be a garbage fire in any language

It might well be part of the problem. Certainly any language can be inefficient, especially in terms of size, if you don't pay attention (have certainly found this with Go recently). But as I said it's also slow (interpreting code, or dealing with cached versions of it) and it's not obvious to me why all three major cloud CLIs have chosen it over alternatives.
I don't understand the Python hate. What would they use instead?

Python is installed on most systems and easy to install when it's not. Only Azure is dumb enough to bundle it, and that was a complaint in the bug - there's no good reason to do so in this day and age.

The performance bottle neck in all three is usually the network communication - have you seen cases where the Python CLI app itself was using 100% of a CPU and slowing things down? I personally haven't.

Looking at the crazy way Azure packaged their CLI, it's hard to believe they weren't making it bloated on purpose.

> Python is installed on most systems (...)

Not on Windows.

And which Python are you talking about? I mean, Python3 is forward compatible but you SoL if you have the bad luck of having an older interpreter installed and you want to run a script which uses a new construct.

I don't understand why Windows people are completely okay having to install all kinds of crazy service packs and Visual C++ runtimes anytime they install anything, but then having to install Python seperately makes it a no-go.
A type safe and memory safe language, like rust? Or their own c# perhaps?
Python is a memory and typesafe language.

Also, AWS is 10 years older than Rust, and C# only runs on Windows (at least it certainly only did when AWS was created, and is laughably more difficult to get running on Linux or OSX than Python).

> Python is a ... typesafe language.

You're funny

> is laughably more difficult to get running on Linux or OSX than Python).

$(dotnet publish) <https://learn.microsoft.com/en-us/dotnet/core/deploying/sing...> is the way they solve that problem in modern .net

And, unlike the scripting language you mentioned, most of the languages on the CLR actually are statically typed

It’s legitimately fun to see people gaining hope something would happen about this and then losing hope, again and again. Thanks for the laugh.

This is how you can tell that people doing systems work aren’t running the sdk project. A gig dependency for a few python scripts is hard to swallow.

> It’s pretty clear if you check github that Azure’s services and documentation are written by distributed teams with little coordination. We have a saying in-house that the info is all in their docs, but the sentences and paragraphs for even trivial things are split across ten or fifteen articles.

You can say that for the APIs themselves. It's like every API call has 80% of the info I want, but the other 20% that logically belongs with that 80% has to come from multiple other API calls.

The one that annoys me on a daily basis is fetching the logs for a pipeline run. The endpoint is `_apis/build/builds/<id>/logs` and it returns a list of pipeline log objects without the task name that generated them. You get an object with these fields: `{"lineCount", "createdOn", "lastChangedOn", "id", "type", "url"}` but no mention of the pipeline stage that generated it.. whether it's the build step, test run, publishing stage, etc. And those ids change (for example if you re-run a failed job, the unit tests may have ID 4 from the first run, and ID 17 for the second try), so you can't just rely on that.

And the pipeline log viewer on the website is garbage. When you click the link to view the logs it doesn't show you the logs it's collected already but starts showing new logs from that point forward and even for that, sometimes it truncates output and will skip lines. Somehow they managed to make trawling through logs even worse than it would normally be.

  … it was written by an enthusiastic junior dev and released with little review
This feels true of so many Windows applications. Super rough POC that then gets released and locked into stone forever.
The new Notepad would hang for minutes if you used it to open a large text file. It also stuttered when scrolling. It’s incredible to see something so low quality make it into a core operating system app release.
> a junior who didn’t know better and was just trying to make progress

While totally plausible, that's kinda besides the point IMO. This shows regardless of how it happened, they don't have sufficient test coverage of these roles. Meaning built-in roles cannot be trusted.

AWS documentation is similarly bad: I used to joke that it was all written down to remind the service team of something rather than as something that is useful for users to read in advance and understand the service.