Hacker News new | ask | show | jobs
by _8j50 813 days ago
For key rotation, it may not be as simple as it sounds. I expect better from MS as well but for example, for on-prem AD, the krbtgt account should be rotated yearly but in practice, it carries a huge risk of outages for accounts that depend on it a lot for kerberos ticketing. I don't know the details but knowing MS, they may have copied over the key distribution design of kerberos to azure ad (hence the "skeleton key") and that maybe why they didn't rotate it frequently.

For the latter issue you mentioned as well, it may be caused by fear of outages. The people implementing the design may have opted for a soft notification to the right people when the key expired but wasn't renewed instead of refusing to validate tokens and causing a global outage affecting every cloud service for every customer.

Hindsight is always 20/20, but why didn't any government, organization or institution require a 3rd party audit of MS prior to this? And how special is MS in it's design compared to gcp or aws? What is MS's response to the findings?

I have a pet-peeve for people that show up into an organization and find everything is done wrong without getting into the nuances and root causes so they can capitalize on the supposed failures for fame and glory. I don't know if that is the case here and certainly MS 's security track record and MSRC's response record is horrible but I am taking this report with a grain of salt.

The government does need to twist MS's arm a lot in my opinion. I've done an objective comparison of cloud provider security capabilities and Azure's is the worst by a large margin, too much nickle and diming to charge customers more for security.

4 comments

  I've done an objective comparison of cloud provider security capabilities and Azure's is the worst by a large margin [ . . . ]
could you say a little more about this—if only to list some security-related functionality that's default or comes with 'base ' licensing in other public clouds, but that Microsoft offers only as add-on? probably a fair list considering the sheer number of tier and add-on SKUs. but anything specific that's particularly egregious?
As a random example, they charge customers to store audit logs. That would be “fine”, except that they charge something like 7x what AWS does for the equivalent service. The AWS pricing is already what I would call “too high”, which makes Azure’s log analytics pricing highway robbery. It can cost more than the VMs it is auditing!

Another fun problem is that their audit logs only log the identity of the person that triggered the event about 50% of the time. In many cases they mask or drop this field, which is the most important piece of data in such a log!

So for example a developer in our org pressed a button in an Application Insights troubleshooting wizard when his app ran out of memory. This “helpfully” doubled the size of an already huge server pool that had a reservation in it. We ended up paying $15K extra that month and never figured out who did it because the logged identity was some internal service account!

Oh, you paid 15k more? I'd say that's a feature.
Me? No. The government did… with your taxes.
Can't share details but we basically listed mitre tactics and what out of box detection/prevention/logging each CSP provides.
anything nonspecific enough to share re: results? how'd/s Azure fare relative?
Basically, they had products for most categories bur unlike other CSPs they were paid and optional and the payment model makes it hard to predict cost. Applying those features across a large number of subscriptions is also not a trivial task. You can compare ASC alerts vs SCC (gcp) yourself and see.
> it may be caused by fear of outages

My only experience with anything close to this is website SSL certs. Back in the day, we used to renew certs from once a year, to as long as once every five years. It was somewhat normal for certs to expire and things to go awry. Then Let's Encrypt came along with certs that expire in 90 days. I believe the thinking was that a shorter period would ensure that systems and org processes were always ready for certificate regeneration, to avoid outages.

My question is the case of Azure AD, is the design of a system where rotating a key would cause an outage, a bad design which is avoidable?

note: Please let me know if I am using any incorrect terminology, or not understanding a basic concept, in the interest of learning.

Maybe it is avoidable, I don't know but this isn't a random website. Consider your letsencrypt example, they don't rotate the root CA cert every few months (I think it's several years? A decade+?). Ir any root CA cert or the DNSSEC root signing key (it's a big deal, there is this whole ceremony about it).

The rotation isn't what stands out to me, it's the fact that the secret material wasn't on some HSM. Rotation can be tricky but why allow applications read access to the private key material at all.

> the krbtgt account should be rotated yearly but in practice, it carries a huge risk of outages for accounts

Bad example, since the krbtgt password needs to be rotated twice, since the old one is stored as well, precisely to avoid outages.

>For key rotation, it may not be as simple as it sounds. I expect better from MS as well but for example, for on-prem AD, the krbtgt account should be rotated yearly but in practice, it carries a huge risk of outages for accounts that depend on it a lot for kerberos ticketing.

If only there were internal development resources that Microsoft could leverage to build a more robust system, maybe one that allows for phasing in of new keys, and not have to wait on external vendors to get around to improving security like the rest of us do.

In hindsight yeah, they could have done better but I suspect they were focused migration to cloud from on prem, doing a whole new robust directory system wasn't top priority. I doubt it is now either unless the government twists their arms. They instead rebranded as entra id lol.