Hacker News new | ask | show | jobs
Microsoft blamed for "a cascade of security failures" in Exchange breach report (arstechnica.com)
120 points by alexandreyc 812 days ago
7 comments

The linked story from 2023 has insane details. I’m pretty sure I had heard this before, but blocked it out due to some sort of normalcy bias.

This plus the latest State Dept. hack deserves pulling the CEO in front of Congress. It is known that there used to be a saying at Microsoft ~”Don’t get Bill pulled in front of Congress“ to avoid making bad decisions. That should be a thing again.

> He also faulted Microsoft for waiting five years to refresh the signing key abused in the attacks, saying best practices are to rotate keys more frequently. He also criticized the company for allowing authentication tokens signed by an expired key, as was the case in the attack.

https://arstechnica.com/security/2023/08/microsoft-cloud-sec...

For key rotation, it may not be as simple as it sounds. I expect better from MS as well but for example, for on-prem AD, the krbtgt account should be rotated yearly but in practice, it carries a huge risk of outages for accounts that depend on it a lot for kerberos ticketing. I don't know the details but knowing MS, they may have copied over the key distribution design of kerberos to azure ad (hence the "skeleton key") and that maybe why they didn't rotate it frequently.

For the latter issue you mentioned as well, it may be caused by fear of outages. The people implementing the design may have opted for a soft notification to the right people when the key expired but wasn't renewed instead of refusing to validate tokens and causing a global outage affecting every cloud service for every customer.

Hindsight is always 20/20, but why didn't any government, organization or institution require a 3rd party audit of MS prior to this? And how special is MS in it's design compared to gcp or aws? What is MS's response to the findings?

I have a pet-peeve for people that show up into an organization and find everything is done wrong without getting into the nuances and root causes so they can capitalize on the supposed failures for fame and glory. I don't know if that is the case here and certainly MS 's security track record and MSRC's response record is horrible but I am taking this report with a grain of salt.

The government does need to twist MS's arm a lot in my opinion. I've done an objective comparison of cloud provider security capabilities and Azure's is the worst by a large margin, too much nickle and diming to charge customers more for security.

  I've done an objective comparison of cloud provider security capabilities and Azure's is the worst by a large margin [ . . . ]
could you say a little more about this—if only to list some security-related functionality that's default or comes with 'base ' licensing in other public clouds, but that Microsoft offers only as add-on? probably a fair list considering the sheer number of tier and add-on SKUs. but anything specific that's particularly egregious?
As a random example, they charge customers to store audit logs. That would be “fine”, except that they charge something like 7x what AWS does for the equivalent service. The AWS pricing is already what I would call “too high”, which makes Azure’s log analytics pricing highway robbery. It can cost more than the VMs it is auditing!

Another fun problem is that their audit logs only log the identity of the person that triggered the event about 50% of the time. In many cases they mask or drop this field, which is the most important piece of data in such a log!

So for example a developer in our org pressed a button in an Application Insights troubleshooting wizard when his app ran out of memory. This “helpfully” doubled the size of an already huge server pool that had a reservation in it. We ended up paying $15K extra that month and never figured out who did it because the logged identity was some internal service account!

Oh, you paid 15k more? I'd say that's a feature.
Me? No. The government did… with your taxes.
Can't share details but we basically listed mitre tactics and what out of box detection/prevention/logging each CSP provides.
anything nonspecific enough to share re: results? how'd/s Azure fare relative?
Basically, they had products for most categories bur unlike other CSPs they were paid and optional and the payment model makes it hard to predict cost. Applying those features across a large number of subscriptions is also not a trivial task. You can compare ASC alerts vs SCC (gcp) yourself and see.
> it may be caused by fear of outages

My only experience with anything close to this is website SSL certs. Back in the day, we used to renew certs from once a year, to as long as once every five years. It was somewhat normal for certs to expire and things to go awry. Then Let's Encrypt came along with certs that expire in 90 days. I believe the thinking was that a shorter period would ensure that systems and org processes were always ready for certificate regeneration, to avoid outages.

My question is the case of Azure AD, is the design of a system where rotating a key would cause an outage, a bad design which is avoidable?

note: Please let me know if I am using any incorrect terminology, or not understanding a basic concept, in the interest of learning.

Maybe it is avoidable, I don't know but this isn't a random website. Consider your letsencrypt example, they don't rotate the root CA cert every few months (I think it's several years? A decade+?). Ir any root CA cert or the DNSSEC root signing key (it's a big deal, there is this whole ceremony about it).

The rotation isn't what stands out to me, it's the fact that the secret material wasn't on some HSM. Rotation can be tricky but why allow applications read access to the private key material at all.

> the krbtgt account should be rotated yearly but in practice, it carries a huge risk of outages for accounts

Bad example, since the krbtgt password needs to be rotated twice, since the old one is stored as well, precisely to avoid outages.

>For key rotation, it may not be as simple as it sounds. I expect better from MS as well but for example, for on-prem AD, the krbtgt account should be rotated yearly but in practice, it carries a huge risk of outages for accounts that depend on it a lot for kerberos ticketing.

If only there were internal development resources that Microsoft could leverage to build a more robust system, maybe one that allows for phasing in of new keys, and not have to wait on external vendors to get around to improving security like the rest of us do.

In hindsight yeah, they could have done better but I suspect they were focused migration to cloud from on prem, doing a whole new robust directory system wasn't top priority. I doubt it is now either unless the government twists their arms. They instead rebranded as entra id lol.
Oh Microsoft has a security failure? Imagine that. Only 40 years of non stop security failures in its history. Why anyone would use Microsoft products is beyond me. You can only blame yourself. Fool me once shame on you, fool me for 40 years shame on me.
I would love to know of a major tech company/product that has NOT had a security failure. This goes double for companies that provide hosting of services that hold juicy personal information.
Yep. I don’t understand why anyone ever uses Exchange; it was a joke 25 years ago, it’s still a joke.
That's edgy without being interesting. I could replace you with a small script.
Notice how little scrutiny Microsoft has been getting by Congress, DOJ, FTC, etc. despite these many huge security blunders and whatever is going on between them and OpenAI.

This might be because it is almost impossible to tell where Microsoft starts and the government ends these days. Also remember that Microsoft was basically the pilot program for Prism.

We're commenting on an government reporting ripping them to shreds in public, and the FTC already announced an investigation into the AI shenanigans, and not just OpenAI.

Please don't self-peasantize or induce it in others.

There are still things that feel murky from reading the CISA report.

For example, it notes that Microsoft do not know for certain how the attacker got in in the first place, but they and the government suspect (see 1.2.4 of the CISA report) it was a compromise of a laptop owned by an employee of Affirmed Networks, who Microsoft bought in 2021.

Are they saying, then, that the attacker was in their network for two years? Or that the attacker was someone able to leap from this laptop to Microsoft's identity systems (which would be very odd, since Affirmed were not in that business, so there would have been no reason for such a laptop to be anywhere close to Azure's insides).

One bright spot in the report, deserving of kudos, is that the folks at the State Department understood their monitoring tools and used them very well to uncover the anomaly that led to the discovery of this compromise.

> Once Microsoft realized that the intruders had used a theoretically expired 2016 consumer signing key to forge tokens for an enterprise customer, it launched an "all-hands-on-deck" investigation that went through the night, June 26–27. The company arrived at 46 hypotheses for the intrusion, including "a theoretical quantum computing capability to break public-key cryptography."

I feel like this is a twist on the denial stage of grief. Sure, our house is on fire...but maybe it is because a asteroid just struck the earth.

> 46 hypotheses for the intrusion

The criticism here doesn't seem warranted. At an early stage of investigation, it seems prudent to iterate all possibilities, including grey swan events. This then allows to them to scale the investigation and delegate to various experts to address each hypothesis.

Related official report:

CISA Releases Report on Microsoft Online Exchange Incident from Summer 2023

https://news.ycombinator.com/item?id=39922066

If it’s Boeing …