Hacker News new | ask | show | jobs
by gouranga 5108 days ago
I'm not a fan of powershell but I have to use it in my line of work.

The only thing that I find sucks is that the pipeline is slow as snails. For example, an svnadmin dump piped to a file which takes 8 mins in cmd.exe takes 14 hours in powershell...

Apart from that it's bearable!

3 comments

The only thing that I find sucks is that the pipeline is slow as snails.

This is because CreateProcess in Windows is slow. It's the reason that run make on Cygwin on Windows for not-that-large Makefiles is really, really slow. The same Makefile on UNIX and Windows differ in startup time by a wide margin. It's really painful to type "make ..." and sit there for 30 seconds on a fast machine.

CreateProcess is only being called once in this case i.e. to spawn svnadmin. The exact script does the following:

   svnadmin dump d:\repo > repo.dump
The output from svnadmin has a lot of lines. Due to the fact that PS is written on top of the CLR, it reads each line into an immutable string before writing it to a file. So for every line it has to create a new System.String object and as another poster said GC it later. Also as lines are not predictable length it has to buffer them resulting in more overhead.

Effectively where *NIX shells use a fixed size buffer for pipe operations and operate on streams, PS has to convert it to lines first before writing it out.

That doesn't work when you have approximately 25 bytes per line and a 12Gb file which is where the issue is.

I can appreciate the technical explanation as a programmer, but as an end-user of PS: I don't care. It's slow.
For the majority of tasks it's fast enough. There are a few edge cases though.
That's when doing it the Enterprisey way bites you in the ass..
I don't understand your comment. There's nothing enterprisey about it.
I mean that from Microsoft's perspective. They apparently decided to put an object around something as simple and essential to performance as a line buffer. That's when you should have hired a system programmer to do that job.

Don't get me wrong, I actually like their approach in developing an OO-shell - but if it hurts performance that much someone has taken that paradigm too far. It's the typical case of someone with a hammer (OO programmer) trying to approach everything like it were nails.

That's a fair evaluation and one I agree with entirely.
Really 14hours?!

Have you tried to change the way the dump is done?

Well you can only pipe to file as the output is on stdout. We just ran it through cmd.exe instead. 8 mins is fine - it's a 12Gb repo.

I assume it's related to PS converting every line into a system.string.

Followed by massive GC most likely. Thanks for sharing.
perhaps the real solution to the performance issue you mention is to have svnadmin actually write to a target file instead of crossing process boundaries to redirect stdout ?

have you tried :

Start-Process '{PATHTOSUBVERSION}\svnadmin' -argumentlist "dump {PATH_TO_REPOSITORY}" -RedirectStandardOutput c:\temp\repodump.dmp -Wait

more examples here : http://www.youtube.com/watch?v=9sn2L0E5jT8

Never tried that. I will have a go. Thanks for the suggestion.

Unfortunately its very verbose for a simple case which works fine in cmd so I'm not sure that it warrants using that tool