Hacker News new | ask | show | jobs
by fishywang 693 days ago
but it's kind of their fault? they designed the api that way, they decided what can be done in userland and what must be done via kernel. they at least _allowed_ it to happen every time.
8 comments

> they designed the api that way, they decided what can be done in userland and what must be done via kernel

They didn’t have much of a choice - it is very hard to get adequate performance with real-time filesystem filtering without doing it in kernel mode. Not aware of any other mainstream OS which succeeds at that.

And they kind of had to provide this feature, since they’ve supported it since forever (antivirus vendors were already doing it back in the days of MS-DOS and Windows 3.x/9x/Me), and there is a lot of market demand for it. It is easy for Linux to say “no” when it never has had support for it (in official kernels)

But, as the blog post points out, it sounds like CrowdStrike is doing a lot of stuff in kernel mode that could be done in user mode instead - whether due to laziness or lack of investment or lack of sophistication of their product architects

> they at least _allowed_ it to happen every time

Microsoft, in allowing third party code to be loaded into their kernel, is no different from other major OS kernels, such as Linux or Apple XNU.

Apple is (increasingly) the most restrictive about this, and a lot of people criticise them for it.

Even Linux imposes some restrictions-which kernel symbols to export (at all or as GPL-only)—although of course being open source, you can circumvent all restrictions by changing the code and recompiling

Mac and Linux run EDRs in userspace without an issue. No one here has an excuse or no choice.
Can you re-read the list (source Wikipedia) in one of the comments in the tree? It had Debian And RedHat issues listed on different dates.
Linux these days tends to use eBPF which isn't really in userspace per-se.
eBPF is like the Twilight Zone. I'm in kernel space but, I'm not.
Well they crowdstrike crashed a kernel with it
Apparently that wasn't (entirely) CrowdStrike's fault: https://news.ycombinator.com/item?id=41030352

Whereas this Windows outage rather obviously was.

eBPF being able to crash the kernel is usually sign of a kernel bug. And it sounds like in this case it was even a bug specific to Red Hat kernels, introduced by a Red Hat patch.

That said, even if they are triggering a Red Hat kernel bug, CrowdStrike should be testing their software adequately enough to pick up that issue before customers do – and it sounds like they haven't been

That was more of a kernel bug than a crowdstrike bug. However, it's clear that they are pushing what you can do in kernel space to the limits, which is not a great sign.
Isn't being able to crash anything with eBPF is a bug in either kernel or eBPF? As I understand it's supposed to prevent exactly that.
eBPF is Linux denying the fact that it's turning into a microkernel and that Linus was wrong.
If you're right for 30 years in tech you're right, even if things eventually change.
When a parking valet takes a car on a joy ride and crashes into a tree, we could blame the tree. We could blame the car owner for handing over the key. We could blame the auto manufacturer that didn't provide a "valet mode". We could blame the police for not detecting the joy ride before the crash.

All of these parties could do better (stupid tree!). But the real problem is the valet.

We can say that it is obvious that the electronics-heavy cars of today should anticipate rogue valets and build in protections. But we shouldn't let rogue valets off the hook for damages.

As a consumer, you could choose to only purchase cars that have "valet mode". So should we blame consumers who don't? If so, we should blame the airlines, hospitals, etc.--not Microsoft.

How about we prosecute valets unless they refuse to park cars that don't have "valet mode"?

> All of these parties could do better (stupid tree!). But the real problem is the valet.

No, the operating system is supposed to provide secure access to hardware and isolate independent subsystems so they can't interfere with each other. That's its whole purpose for existing. The fact that people feel they need to deploy CS is a Microsoft failure. Windows is just not a secure OS.

> The fact that people feel they need to deploy CS is a Microsoft failure

They don't need to deploy shit. Only reason it's deployed because it's a whole racket.

You’re shifting practically the entirety of the blame to a company that at best was an accomplice to the issue.

I get that you hate Microsoft, but not everything is their fault and it’s disingenuous to pretend otherwise.

> ing. The fact that people feel they need to deploy CS is a Microsoft failure.

CS is also available and widely deployed on Mac and Linux. Is that a failure of Apple and all the distros? It literally took down Debian and Red Hat systems earlier this year, is that also not CS’s fault?

> I get that you hate Microsoft,

I don't.

> CS is also available and widely deployed on Mac and Linux. Is that a failure of Apple and all the distros

Yes. All widely deployed commodity operating systems have terrible security designs. None of them have access control systems that enable the principle least privilege, let alone encourage or prioritize it, and none of them are written in robust languages that make verification of safety or security properties possible. Microsoft has made some headway on partial verification, but it's a far cry from what's needed.

> Yes. All widely deployed commodity operating systems have terrible security designs. None of them have access control systems that enable the principle least privilege, let alone encourage or prioritize it, and none of them are written in robust languages that make verification of safety or security properties possible. Microsoft has made some headway on partial verification, but it's a far cry from what's needed.

What, exactly, is your solution then? To never use a computer again? Because that's certainly what it sounds like.

Secure, robust operating system designs have been known since the 1970s. KeyKOS, EROS, CapROS. All commodity systems instead use classic access control lists, subject to fundamentally unsolvable access control vulnerabilities. seL4 finally implemented those lessons but it's far from a commodity operating system.
You could also choose to park the car yourself or plan for a secondary mode of transportation if something happened to your car.

Not the best analogy. The organization who deploys said software is responsible for the uptime of their systems. They didn't have to use CrowdStrike and if they do they should have a plan in the event of failure.

You could also prosecute the establishment that keeps a valet with an abominable record on staff.

Microsoft took no steps to force-eject them from their ecosystem, despite their long history of issues.

Just to be clear within the analogy: are you expecting the auto manufacturers to "force-eject" any hotel on Park Ave that has a record of valet mishaps? Or did you mean individual cars should force-eject the valet?

If a Caesars Entertainment property in Macao has enough incidents, should GM update the firmware on their automobiles to force-eject valets at Caesars Entertainment properties in Las Vegas?

Now imagine that GM actually operates valet services in Macao and Las Vegas. Should they be allowed to force-eject valets from competing services?

I am not a Microsoft apologist. I think they should do better. I think Linux and FreeBSD should do better. I personally avoid Microsoft products. But I place more blame on people who use MS products than I do on MS. After all, I never intend to hand my beat up old Corolla over to a valet so why should I have to pay for a "valet mode" feature that Toyota is forced to build into all their cars? Isn't it reasonable that motorcycles, 18-passenger vans, and scooters don't need "valet mode"?

In my book, the auto manufacturer is lower on the list of culprits than the valet, "the establishment that keeps a valet with an abominable record on staff", and the vehicle owner. But some place like Car and Driver could definitely prioritize encouraging GM or Toyota to develop valet modes over berating owners; so I don't mind a place like HN shooting a few arrows at MS. Unless the general public follows their lead and lets bad guys off the hook by shifting too much focus to somebody lower on the list.

> Just to be clear within the analogy: are you expecting the auto manufacturers to "force-eject" any hotel on Park Ave that has a record of valet mishaps? Or did you mean individual cars should force-eject the valet?

Not OP, but I think the analogy here is the hotel "fore-ejecting" (firing) the valet with a history of doing joy rides. That seems very reasonable.

In the analogy, it seems Microsoft is a car manufacturer. The hotel is the company that bought software from CrowdStrike. The problem is that Microsoft should not control who has access to which APIs, that is a huge can of worms, and actually called anticompetitive by the EU from what I understand. At MS level, either they publish APIs or not. If published, anyone should be able to write software for them. This is especially bad if MS themselves also sell security software that uses the same APIs. It would literally mean MS deciding who is allowed to compete with their security software.
I think it works better (please allow me to change it) if Microsoft is the hotel. Crowdstrike is the restaurant inside the hotel. The restaurant is serving poisoned food to the guests, who assume it is a decent restaurant because it is in their hotel.

Also the restaurant has their own entrance without security and questionable people are entering regularly, and they are sneaking into the hotel rooms and stealing some items, breaking the elevator.

At the same time, the hotel is in a litigation process with the restaurants association, because in the past they did not allow any restaurant on their premises. The guests, naturally, do not care about this, since their valuables have been stolen, and they have food poisoning. The reputation of the hotel is tarnished.

This is the correct interpretation. I am surprised that people took it in different directions.
I'm expecting restaurant owners to fire bad valets.

Or in Microsoft's case, via regulatory, social, or software, prevent Crowdstrike from causing harm to their customers.

I'm aware it's a sticky regulatory situation, but CS has a history of these failings and the potential damage could be severe. Despite this, no effort (that I am aware of) was made by Microsoft to inform customers that Crowdstrike introduced potential risks, nor to inform regulators, nor to remove the APIs CS depends on.

I don't believe Microsoft is solely responsible, but I do believe that throwing all of the blame for the very real harm that was caused onto CS alone is missing a piece of the puzzle.

Last aside, every large corp has team(s) focused on risk. There's approximately zero chance they didn't discuss CS at some point. The only way this would not have happened is negligence.

Back in 2006 Microsoft tried to keep 3rd party vendors out of their ecosystem. <https://arstechnica.com/information-technology/2006/10/7998/> As a result of a complaint to the EU Microsoft was required to let them have kernel access. <https://www.theregister.com/2024/07/22/windows_crowdstrike_k...>
Microsoft was required to let them have the same access their own software used. Which seems fair to me. Microsoft can remove those APIs entirely, they just can't restrict them.
Can Microsoft legally ban a competitor for percieved incompetence? I doubt it . partiuclarly seeing how much competence is shown with windows and MS teams software
Microsoft assigns driver levels to these guys etc. and allows them to load kernel mode components as protected etc.. If they do not allow that - CS cannot cause such damages. ofcourse, as you pointed out, this will then turn into some lawsuit blaming MS for killing competitors, even if they do it to try and protect their customers.

wonderful world.

> Microsoft took no steps to force-eject them from their ecosystem, despite their long history of issues.

I’m pretty sure anti trust law doesn’t allow Microsoft to go anywhere near that kind of action, even if they wanted to be more Apple like.

Problem is that the establishment here is well the establishment. That is the state itself. Or at least one of them. As somehow MS is in position where for any slight anti-trust thing they will be prosecuted. Our system is setup to allow these actors in...
You can't just let people do anything from userland, the performance would tank. As for restricting kernelland, EU competition regulators would not be happy if MS was the only one able to write anti virus software that runs in kernelland.
> You can't just let people do anything from userland, the performance would tank

Isn't the point of userland that you can (try to) do anything from there?

It seems like MacOS and Linux provide substantially safer alternatives that are still performant?

> As for restricting kernelland, EU competition regulators would not be happy

I keep seeing people say this. Is there a basis for that assertion, or is that mere speculation? Again, hasn't MacOS already deprecated kexts?

There is basis for that assertion.

Via Google: https://www.techtarget.com/searchsecurity/news/450420491/Mic...

(Also via myself, as I was at MS when we wanted to make this change and the EU said no.)

Well Microsoft did not publicly commit to using the same APIs, and no privileged access, for its own antivirus products. That's why the EU said no way; not because kernel access was revoked.
Yes, but then of course Microsoft is being obligated to open part of kernelspace to competitors, which is arguably "OK" from a competitive regulation perspective, but that then places a special burden on competitors to maintain code hygiene given the potential for crashes. It makes CrowdStrike's negligence all the more unacceptable.
I believe what philistine is suggesting is that Microsoft could have implemented their own security offering using a safer alternative like eBPF, and then opened that interface to competitors as well.

I think that would have been a proactive approach. That said, I'm not entirely convinced that the EU was right to place the restriction in the first place.

The article you shared says that Kaspersky filed a complaint, but I didn't see a clear statement there about what the outcome was. I do now see other reputable sources reporting that an agreement was reached in 2009 where Microsoft promised to allow vendors the same access to the kernel its security software had [0].

I think a proactive approach might have been for Microsoft to provide safer interfaces with the kernel, and then use those in its own security offerings.

That said, it does sound like EU competition regulation was a contributing factor here, and I think the EU is wrong on this one and that an OS vendor should not be required to provide unrestricted kernel access to allow security software vendors to compete.

Mostly unrelated, it seems somewhat interesting that this was Kaspersky insisting on kernel access... The US government seems convinced they are compromised.

[0]: https://www.ft.com/content/60dde560-194a-40d1-8c98-1d96d6d01...

What are the Linux alternatives you are talking about?
MacOS still keeps the kexts support around, even if the long term roadmap is to move everything into userspace.
Your car _allows_ you to drive off a cliff. If you do so, it is your fault, not the fault of the car manufacturer.

Kind of weird that anyone is blaming Microsoft for any part of this, imo

Mmm… meaningless analogies are kind of meaningless?

More like:

If you install a security product that then prevents your car from starting; are they entirely blameless for letting you install it?

If you pull the hood up, tear off the “voids warranty” seal, ignore the “don’t open this” labels, crack the seals open and shove something into the engine… sure.

…but if you just slap a widget with the “vendor approved” sticker on your dash and it bricks your car; that’s a bit sucky right?

I do feel Microsoft is not entirely blameless in this.

It should be easier to recover from this kind of thing.

They should have been paying attention and made a fuss that one of the biggest security vendors has been doing this literally since they started.

I would bet money that until two weeks ago Microsoft was high-5ing them for best security practices.

It’s not “their fault” but they can’t just go “wasn’t us!”.

It was them.

It wasn’t macOS. It wasn’t *nix.

Suck it up. They should’ve done better.

Except Crowdstrike had 3 separate Linux incidents, including kernel panics, directly before this happened.
And at least one of them was actually a Redhat kernel bug, where eBPF caused a kernal panic when it shouldn't be able to?
That is the problem: you feel.

Before Microsoft comes into the picture the issues is crowdstrike pushing updates without proper testing, selling a product on which customers cannot control the update schedule, and customers for being so naives and not checking what the product they install on critical stuff do.

The big difference is that CS is not the user. In you analogy it's like your car allows you to drive off a cliff, and an (almost) essential part of your car (for example, the pedal) drives the car off a cliff.
> CS is not the user

It got there because a user or administrator approved and installed it. It didn't just appear there, Microsoft didn't install it there. The user ran it.

Right, so a slightly better analogy would be if you wanted to install a remote starter, but then you find out that they can only be installed into Fords, because other auto manufacturers (Apple, Linux in this case) believe that tampering with the critical path (the engine, kernel) is unsafe. It isn't Ford who's at fault for allowing you to run some random engine modification, it's that mod that is at fault.
If it's a custom after market part, how can you blame the car manufacturer and not the part maker?
An OS flexible enough where you can do something stupid enough to completely break it.

Basically IOS which is so locked you can't even run apps not expressively approved by Apple.

Pick one. If I build a bike and you remove the breaks to save weight don't get mad at me when you crash.

Microsoft tried to lock down kernel access in the Windows Vista era. Antivirus vendors went crying to the EU and they forced Microsoft to allow access to the kernel to third parties.
i would have thought that in 2024 a bad driver update is something that windows would automatically roll back.

or at least provided some level of protection against crashes in third party kernel code.

I think if I understand the systems right Windows can roll back a bad driver update but the CS update wasn’t an update to the driver but instead updated a configuration file which CS updated outside of Windows Update. So from the Windows Update perspective the system started failing to boot with no changes to the system. Again though I don’t know if I totally understand what CS did and what capabilities Windows Update has.
Good explanation about this point at 11:15 over at https://youtu.be/wAzEJxOo1ts?si=wGXDJZtUczcIui9F
It was not a driver update.
No you can’t roll back bad driver updates in any OS, if you could then by definition they do not sit in the kernel space. You just want the security code to not run in kernel space, which is a decision MS could maybe make and become like Apple, though most security software would in that case rebel.
> No you can’t roll back bad driver updates in any OS, if you could then by definition they do not sit in the kernel space.

drivers and kernel binaries are typically installed and maintained by user space programs that run with some sort of elevated privileges.

"kernel space" is just a runtime context, what gets loaded into there typically comes ordinary (protected) files on the disk.

That doesn't make any sense.

The OS loads file A into the kernel. It crashes. It reboots. It decides not to load file A this time.

Wow, it's a rollback of kernel-space code.

Unless your argument is that you can't guarantee a rollback of every possible kernel driver, because it might have installed a rootkit while it had full control? Okay, cool, but this isn't a malware removal idea. It's an idea for normal drivers.

it depends on how bad. in Linux you can rmmod to get rid of the bad one if you haven't wedged it and fix your code, compile, and try again. I can't imagine that's actually different on windows if you know what you're doing. how do you think driver development happens?
it's like userland video driver - thousands context switches per second, performance will dive...