Hacker News new | ask | show | jobs
by balloneij 1896 days ago
Window's slow thread spawn time is incredibly noticeable when you use Magit in Emacs.

It runs a bunch of separate git commands to populate a detailed buffer. It's instantaneous on MacOS, but I have to sit and stare on Windows

5 comments

Do you mean *process* spawn time?

From the article:

> On Windows, assume a new process will take 10-30ms to spawn. On Linux, new processes (often via fork() + exec() will take single digit milliseconds to spawn, if that).

> However, thread creation on Windows is very fast (~dozens of microseconds).

Yes, they clearly mean process spawn time:

> It runs a bunch of separate git commands

One of many reasons why I prefer to run Emacs under WSL1 when on Windows. WSL1 has faster process start times.

But then with git, there are other challenges. It took me a while to make Magit usable on our codebase (that for various reasons needs to be on the Windows side of the filesystem) - the main culprit were submodules, and someone's bright recommendation to configure git to query submodules when running git status.

Here's the things I did to get Magit status on our large codebase to show in a reasonable time (around 1-2 seconds):

- git config --global core.preloadindex true # This should be defaulted to true, but sometimes might not be; it ensures git operations parallelize looking at index.

- git config --global gc.auto 256 # Reduce GC threshold; didn't do much in my case, but everyone recommends it in case of performance problems on Windows...

- git config status.submoduleSummary false # This did the trick! It significantly cut down time to show status output.

Unfortunately, it turned out that even with submoduleSummary=false, git status still checks if submodules are there, which impacts performance. On the command line, you can use --ignore-submodules argument to solve this, but for Magit, I didn't find an easy way to configure it (and didn't want to defadvice the function that builds the status buffer), so I ended up editing .git/config and adding "ignore = all" to every single submodule entry in that config.

With this, finally, I get around ~1s for Magit status (and about 0.5s for raw git status). It only gets longer if I issue a git command against the same repo from Windows side - git detects the index isn't correct for the platform, and rebuilds it, which takes several seconds.

Final note: if you want to check why Git is running slow on your end, set GIT_TRACE_PERFORMANCE to true before running your command[0], and you'll learn a lot. That's how I discovered submoduleSummary = false doesn't prevent git status from poking submodules.

--

[0] - https://git-scm.com/docs/git, ctrl+f GIT_TRACE_PERFORMANCE. Other values are 1, 2 (equivalent to true), or n, where n > 2, to output to a file descriptor instead of stderr.

Wow that's very helpful. I'll give it a shot next time I'm at work
To precise, you say WSL1 is faster compared to Windows, or compared to WSL2? With WSL2 (and native-comp emacs branch) I've never noticed any unusual slowdowns with magit or other.

I haven't tried WSL1.

WSL1 process creation is faster compared to Windows, because part of the black magic it does to run Linux processes on NT kernel is using minimal processes - so called "pico processes"[0]. These are much leaner than standard Windows processes, and more suited for UNIX-style workflow.

I can't say if it's faster relative to WSL2, but I'd guess so. WSL2 is a full VM, after all.

--

[0] - https://docs.microsoft.com/en-us/archive/blogs/wsl/pico-proc...

It shouldn't actually be a noticeable difference. HW virtualization means that unless the guest is doing I/O or needs to be interrupted to yield to the host, the guest is kind of just doing its thing. Spawning a new user space process in a VM should, in theory, be basically the same speed as spawning a new user space process on the bare metal. How that compares to the WSL1 approach of pico processes I don't know, but Linux generally has a very optimized path for spawning a process that I would imagine is competitive.
Yeah, I hope this is one of the issues Microsoft address some time because although CreateProcess is a slightly nicer API in some regards the cost is very high. It may not be possible to fix it without removing backwards-compatibility, but maybe we could have a new "lite" API.

The bit about Windows Defender being hooked into every process is also infuriating. We pay a high price for malware existing even if we're never hit by it.

Yes. This makes me wonder if I could speed up our builds by 2x by whitelisting the source repository folder. If it's at all possible (and company policy allows for it)...
One thing that deeply frustrates me is that I simply don't know which things are slowed down by Defender. I can add my source repos to some "exclude folder" list deep in the Defender settings, but I've yet to figure out whether that actually does something, whether I'm doing it right, whether I should whitelist processes instead of folders or both, I have no idea.

If anyone here knows how to actually see which files Defender scans / slows down, then that would be awesome. Right now it's a black box and it feels like I'm doing it wrong, and it's easily the thing I dislike the most about developing on Windows.

Writing things that do a lot of forking, like using the multiprocess or subprocess modules in python, is basically unusable to my coworkers who use windows.

Startup time for those processes goes from basically instant to 30+ seconds.

I researched this a little bit and it seems that it may be related to DEP.

It's basically just Windows: Back when the current Windows architecture was designed (OS/2 and Windows NT going forward--not Win9x) the primary purpose of any given PC was to run one application at a time. Sure, you could switch applications and that was well accounted for but the entire concept was that one application would always be in focus and pretty much everything related to process/memory/file system standpoint is based around this assumption.

Even for servers the concept was and is still just one (Windows) server per function. If you were running MSSQL on a Domain Controller this was considered bad form/you're doing something wrong.

The "big change" with the switch to the NT kernel in Windows 2000 was "proper" multi-user permissions/access controls but again, the assumption was that only one user would be using the PC at a time. Even if it was a server! Windows Terminal Server was special in a number of ways that I won't get into here but know that a lot of problems folks had with that product (and one of many reasons why it was never widely adopted) were due to the fact that it was basically just a hack on top of an architecture that wasn't made for that sort of thing.

Also, back then PC applications didn't have too many files and they tended to be much bigger than their Unix counterparts. Based on this assumption they built in hooks into the kernel that allow 3rd party applications to scan every file on use/close. This in itself was a hack of sorts to work around the problem of viruses which really only exist because Windows makes all files executable by default. Unfortunately by the time Microsoft realized their mistake it was too late to change it and would break (fundamental) backwards compatibility.

All this and more is the primary reason why file system and forking/new process performance is so bad on Windows. Everything that supposedly mitigates these problems (keeping one process open/using threads instead of forking, using OS copy utilities instead of copying files via your code, etc) are really just hacks to work around what is fundamentally a legacy/out-of-date OS architecture.

Don't get me wrong: Microsoft has kept the OS basically the same for nearly 30 years because it's super convenient for end users. It probably was a good business decision but I think we can all agree at this point that it has long since fallen behind the times when it comes to technical capabilities. Everything we do to make our apps work better on Windows these days are basically just workarounds and hacks and there doesn't appear to be anything coming down the pipe to change this.

My guess is that Microsoft has a secret new OS (written from scratch) that's super modern and efficient and they're just waiting for the market opportunity to finally ditch Windows and bring out that new thing. I doubt it'll ever happen though because for "new" stuff (where you have to write all your stuff from scratch all over again) everyone expects the OS to be free.

> Also, back then PC applications didn't have too many files and they tended to be much bigger than their Unix counterparts.

Okay, let me interrupt you right here. To this very day Linux has a default maximum number of file descriptors per process as 1024. And select(3), in fact, can't be persuaded to use FDs larger than 1023 without recompiling libc.

Now let's look at Windows XP Home Edition -- you can write a loop of "for (int i = 0; i < 1000000; i++) { char tmp[100]; sprintf(tmp, "%d", i); CreateFile(tmp, GENERIC_ALL, FILE_SHARE_READ, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); }" and it will dutifully open a million of file handles in a single process (although it'll take quite some time) with no complaints at all. Also, on Windows, select(3) takes an arbitrary number of socket handles.

I dunno, but it looks to me like Windows was actually designed to handle applications that would work with lots of files simultaneously.

> fundamentally a legacy/out-of-date OS architecture

You probably wanted to write "badly designed OS architecture", because Linux (if you count it as continuation of UNIX) is actually an older OS architecture than Windows.

1024 is a soft limit you can change through ulimit.

The actual limit can be seen via 'sysctl fs.file-max'. On my stock install it's 13160005.

> I doubt it'll ever happen though because for "new" stuff (where you have to write all your stuff from scratch all over again) everyone expects the OS to be free.

I think one way they could pull it off is to do a WSL2 with Windows - run the NT kernel as a VM on the new OS.

As for the price, I think they're already heading there. They already officially consider Windows to be a service - I'm guessing they're just not finished getting everyone properly addicted to the cloud. If they turn Windows into SaaS execution platform, they may just as well start giving it away for free.

>My guess is that Microsoft has a secret new OS (written from scratch) that's super modern and efficient and they're just waiting for the market opportunity to finally ditch Windows and bring out that new thing. I doubt it'll ever happen though because for "new" stuff (where you have to write all your stuff from scratch all over again) everyone expects the OS to be free.

https://en.wikipedia.org/wiki/Midori_%28operating_system%29

>My guess is that Microsoft has a secret new OS (written from scratch) that's super modern and efficient and they're just waiting for the market opportunity to finally ditch Windows and bring out that new thing. I doubt it'll ever happen though because for "new" stuff (where you have to write all your stuff from scratch all over again) everyone expects the OS to be free.

More and more stuff gets offloaded onto the WSL for stuff which doesn't need interactive graphics or interoperability through the traditional windows IPC mechanisms.

In my experience, Magit is slow even on Linux. On my small repos at home, subjectively magit-status seems to take around 0.2-0.3 seconds. And that's just status, the most basic information you ask of git. Committing is several times slower. On a large codebase at work, magit-status usually takes around 10 seconds, sometimes longer. Again, I'm usually running it to just check some basic metadata (what branch I'm on, do I have a dirty tree, if yes, then what files are changed), so it's frustrating to wait. Honestly, I'd expect stuff like that to update effortlessly in real time without me issuing any commands. This is what happens in some other editors. However, currently I'm glued to Emacs because of Tramp for working remotely in a nice GUI and org-mode for time-tracking (TaskWarrior/TimeWarrior isn't for me).

I prefer Fork on Windows and Mac (prefer the Windows version for aesthetic reasons). Unfortunately, it's not available for Linux.