| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sfink 2628 days ago

I read the paper, and they make a lot of good points about fork's warts.

But I really wanted some explanation of why Windows process startup seems to be so heavyweight. Why does anything that spawns lots of little independent processes take so bloody long on Windows?

I'm not saying "lots of processes on Windows is slow, lots of processes on Linux is fast, Windows uses CreateProcess, Linux uses fork, CreateProcess is an alternative to fork/exec, therefore fork/exec is better than any alternative." I can imagine all kinds of reasons for the observed behavior, few of which would prove that fork is a good model. But I still want to know what's going on.

7 comments

ralish 2628 days ago

I'm a bit rusty on this but from memory the overhead is by and large specific to the Win32 environment. Creating a "raw" process is cheap and fast (as you'd reasonably expect), but there's a lot of additional initialisation that needs to occur for a "fully-fledged" Win32 process before it can start executing.

Beyond the raw Process and Thread kernel objects, which are represented by EPROCESS + KPROCESS and ETHREAD + KTHREAD structures in kernel address space, a Win32 process also needs to have:

- A PEB (Process Environment Block) structure in its user address space

- An associated CSR_PROCESS structure maintained by Csrss (Win32 subsystem user-mode)

- An associated W32PROCESS structure for Win32k (Win32 subsystem kernel-mode)

I'm pretty sure these days the W32PROCESS structure only gets created on-demand with the first creation of a GDI or USER object, so presumably CLI apps don't have to pay that price. But either way, those latter three structures are non-trivial. They are complicated structures and I assume involve a context switch (or several) at least for the Csrss component. At least some steps in the process also involve manipulating global data structures which block other process creation/destruction (Csrss steps only?).

I expect all this Win32 specific stuff largely doesn't apply to e.g. the Linux subsystem, and so creating processes should be much faster. The key takeaway is its all the Win32 stuff that contributes the bulk of the overhead, not the fundamental process or thread primitives themselves.

EDIT: If you want to learn more, Mark Russinovich's Windows Internals has a whole chapter on process creation which I'm sure explains all this.

intea 2628 days ago

The WSL processes are called pico processes.

https://blogs.msdn.microsoft.com/wsl/2016/05/23/pico-process...

olemartinorg 2628 days ago

That was a super interesting read (and view), thank you. I've been in Linux land for almost two decades, but I've also spent a week (or so) porting our Linux-based development environment over to Windows with the help of WSL. This sheds some light on how it actually works. Maybe I'll have to look over it once more armed with this new information and see if I can squash some of those remaining problems with our solution.

pjmlp 2628 days ago

You can also dive into the Drawbridge research papers, and how they used the LibOS concept to bring SQL Server into Linux

Gibbon1 2628 days ago

> created on-demand with the first creation of a GDI or USER object, so presumably CLI apps don't have to pay that price

This tickles my brain. I read some blog post bitching that because Windows DLL's are kinda heavy weight it's way easy end up paying that price without realizing it.

cesarb 2628 days ago

It probably was this one: https://randomascii.wordpress.com/2018/12/03/a-not-called-fu...

JdeBP 2628 days ago

As mentioned in this very discussion 2 hours before. (-:

* https://news.ycombinator.com/item?id=19622723

speedplane 2628 days ago

I used to work on a cross-platform project, and spent several weeks trying to figure out why our application ran significantly faster on linux than windows. One major culprit was process creation (another was file creation). I never really uncovered the true reason, but I suspect it had to do with the large number of DLLs that Windows would automatically link if you weren't very careful. Linux, of course, can also load shared code objects, but in my experience, they are smaller and lighter weight.

fanf2 2628 days ago

Anti-virus software makes process and file operations a lot slower.

kevin_b_er 2628 days ago

This should not be ignored. Windows machines are a favorite for having lots of heavy anti-virus running on them. They can destroy I/O performance. Windows 10 has a "real time scanner" running by default, but many corporate-IT security teams will add more and more. This alone can seriously slow down windows vs linux.

speedplane 2622 days ago

> Anti-virus software makes process and file operations a lot slower.

It was a long time ago (~2006), and I honestly can't remember, but I feel like turning off anti-virus (and also backups, software updaters, and any other resident software) would have been one of the first things I would have checked. There was definitely something more fundamental going on.

zenexer 2628 days ago

This probably isn't the technical explanation your looking for, but, in general, processes on Windows and processes on Unix aren't the same--or, at least, they're not meant to be used the same way. Creating lots of small processes on Windows has long been discouraged and considered poor design, whereas the opposite is true on Unix.

One could probably argue that processes on Windows need to be lighter-weight now that sandboxing is a common security practice. These days, programs like web browsers opt to create a large number of processes both for security and stability purposes. In much the same way that POSIX should deprecate the fork model, Windows should provide lighter-weight processes.

wvenable 2628 days ago

Windows now has minimal processes that have almost no setup and pico processes (based on minimal processes) that are the foundation for Linux processes in WSL.

waterhouse 2628 days ago

The last time I used WSL (perhaps 6 months ago), its per-process overhead was awful. I don't recall the numbers, but I think it managed to start fewer than 10 processes per second. My memory suggests it was more like two processes per second, though I would recommend re-testing before trusting that.

Found my previous comment on it (which has a test case but not numbers): https://news.ycombinator.com/item?id=18226921

temac 2628 days ago

On 1803 and 1903 it's 3 to 4 times faster than MsysGit (WSL is ~1s on my laptops). It is possibly slightly faster on 1903 as my laptop running it is faster than the other for this bench, despite having an older processor.

Now in a Linux VM it's approx 10 times faster than even WSL. And that should probably be even faster natively.

So anyway WSL is really usable and if you really only started 10 processes per sec something is wrong. Maybe you are using a crappy antivirus (I've heard that Kaspersky makes WSL extremely slow)

waterhouse 2628 days ago

Well, I hadn't installed any antiviruses myself. I think Windows Defender was running, though. It's possible that my computer came with additional crapware on it.

temac 2628 days ago

I just checked and both of my benchs were done with Defender.

When I disable it, it is down to ~0.5s

I would not build a Linux kernel here instead of in a VM, but for tons of things, this is very usable.

SifJar 2628 days ago

Others have mentioned about DLLs being pulled in, following post might be interesting:

https://randomascii.wordpress.com/2018/12/03/a-not-called-fu...

chris_wot 2628 days ago

It's not process creation that is tricky, it's process termination!

To see how Libreoffice does it, see https://opengrok.libreoffice.org/xref/core/sal/osl/w32/proce...

naasking 2628 days ago

Microsoft Research doesn't just do research on Windows. They employ lots of researchers that are free to pursue many different topics.

kazinator 2628 days ago

CreateProcess requires an application to initialize from scratch. When you fork, you cheaply inherit the initialized state of the whole application image. Only a few pages that are mutated have to be subject to copy-on-write. Even that copy-on-write is cheaper than calculating the contents of those pages from scratch.

JdeBP 2628 days ago

There has been a lot of discussion in recent years about how cheap that "cheaply" really is.

* https://news.ycombinator.com/item?id=9653238

* https://news.ycombinator.com/item?id=18071278

* https://news.ycombinator.com/item?id=19622503

cryptonector 2628 days ago

Yeah, it's not really cheap at all. However! vfork() is cheap, very very cheap, though, of course, you then have to follow it up with an exec(), and the cost of that on Windows depends on the setup cost of the executable being exec'ed.

Part of the problem is the DLLs, as many have mentioned, and also the fact that each statically links in its own CRT (C run-time). The shared C run-time MSFT is working on should help here. As should more lazy loading and setup.

dblohm7 2628 days ago

> Part of the problem is the DLLs, as many have mentioned, and also the fact that each statically links in its own CRT (C run-time)

No, that isn't the case on DLLs shipped with Windows.

cryptonector 2628 days ago

But it is for 3rd party DLLs.

dblohm7 2627 days ago

Well, that depends on the DLL.

kazinator 2627 days ago

fork is pretty much always going to be cheaper than starting a new process in scratch over the same executable image (and library images) and then re-playing everything inside that process so that it gets into exactly the same state as the creator to be a de facto clone of it.

muststopmyths 2628 days ago

If I had to guess, I'd point to DLLs. The minimal Windows process loads probably half a dozen, plus the entry points are called in a serialized manner.

richardwhiuk 2628 days ago

Pretty much identical to shared objects on Linux

chungleong 2628 days ago

Windows DLLs require fixups when they're loaded off their preferred base address.

pjc50 2628 days ago

So do relocatable shared libraries on Linux. https://eli.thegreenplace.net/2011/08/25/load-time-relocatio...