Hacker News new | ask | show | jobs
by red2awn 696 days ago
> How Do We Prevent This From Happening Again?

> Software Resiliency and Testing

> * Improve Rapid Response Content testing by using testing types such as:

> * Local developer testing

So no one actually tested the changes before deploying?!

2 comments

And why is it "local developer testing" and not CI/CD. This makes them look like absolute amateurs.
> This makes them look like absolute amateurs.

This applies also to all Architects and CTO's at all these Fortune 500 companies, who allowed these self updating systems into their critical systems.

I would offer a copy of Antifragile to each of these teams: https://en.wikipedia.org/wiki/Antifragile_(book)

"Every captain goes down with every ship"

Architects likely do not have a choice. These things are driven by auditors and requirements for things like insurance or PCI and it’s expensive to protest those. I know people who’ve gone full serverless just to lop off the branches of the audit tree about general purpose server operating systems, and now I’m wondering whether anyone is thinking about iOS/ChromeOS for the same reason.

The more successful path here is probably demanding proof of a decent SDLC, use of memory-safe languages, etc. in contract language.

> Architects likely do not have a choice.

Architects don't have a choice, CTO are well paid to golf with the CEO and delegate to their teams, Auditors just audit but are not involved with the technical implementations, Developers just develop according to the Spec, and Security team just are a pain in the ass. Nobody owns it...

Everybody get's well paid, and at the end we have to get lessons learned...It's a s*&^&t show...

Some industries are forced by regulation or liability to have something like crowdstrike deployed on their systems. And crowdstrike doesn't have a lot of alternatives that tick as many checkboxes and are as widely recognized.
Please give me an example of that specific regulation.
Seems like everyone thinks that Execs play golf with another Execs to seal the deal regardless how b0rken the system is.

That CTO's job is on the line if the system can't meet the requirement, more so if the system is fucked.

To think that every CTO is dumbass is like saying "everyone is stupid, except me, of course"

Not all CTO...but you just saw hundreds of companies, who could do better....
They don't care, CI/CD, like QA, is considered a cost center for some of these companies. The cheapest thing for them is to offload the burden of testing every configuration onto the developer, who is also going to be tasked with shipping as quickly as possible or getting canned.

Claw back executive pay, stock, and bonuses imo and you'll see funded QA and CI teams.

It sure sounds like the "Content Validator" they mention is a form of CI/CD. The problem is that it passed that validation, but was capable of failing in reality.
The content validator is a form of validation done in CI. Their CD pipeline is the bigger problem here: it was extremely reckless given the system it was used in (configuring millions of customer machines in unknown environments). A CD pipeline for a tiny startup's email service can just deploy straight away. Crowdstrike (as they finally realized) need a CD pipeline with much more rigorous validation.
The fact that they even listed "local developer testing" is pretty weird.

That is just part of the basic process and is hardly the thing that ensures a problem like this doesn't happen.

This also becomes a security issue at some point. If these updates can go in untested, what's to stop a rogue employee from deliberately pushing a malicious update?

I know insider threats are very hard to protect against in general but these companies must be the most juicy target for state actors. Imagine what you could do with kernel space code in emergency services, transport infrastructure and banks.