Hacker News new | ask | show | jobs
by barrkel 5108 days ago
Almost nothing, except in-memory processing, requires 64-bit for dealing with files over 2GB. Piping with utilities is the Unix way, and it works well with files of any size in Cygwin.

Cygwin commands that run slower than Windows counterparts are typically those that are syscall heavy, where those syscalls are significantly different on Windows and need lots of work for emulation. The biggie is fork(); it's better by far to write scripts etc. in such a way that they stream results rather than iterate and create new processes.

So, for example, rather than write a script that converts Unix paths to Windows paths with iterated calls to cygpath -w, instead pipe the paths to cygpath -w -f -. Rather than use pipe-to-sed (like "$(echo $foo | sed 's|bar|baz|')") for ad-hoc edits, try to use shell substitutions instead (like "${foo/bar/baz}").

Another thing that can be slower in Cygwin is find, when run over very large directory trees. I wrote a wrapper script (I call it rdir) that runs "cmd.exe /c dir /b" and massages the output into a Cygwin-style format. I also have the same script written in terms of find, so that my scripts that use it work on Windows, Solaris, Linux and OS X.

But I have to say, the biggest limiting factor in me solving ad-hoc problems is composing the tools available, rather than the actual runtime speed of the tools themselves. Having all the Unix tools available makes my life far easier in this respect. They could be even slower, and I wouldn't mind, because I would still be saving lots of time compared to what Windows provides; and my scripts usually also work on all my other systems running different OSes.

PowerShell doesn't even support simple fork-join like bash does trivially:

    for x in {1..10}; do (sleep $x; echo $x) & done; wait
I use this idiom a lot when dealing with lots of multi-gigabyte files. PowerShell is mostly useful to me when I need to access Windows-specific stuff that Cygwin doesn't do well, like WMI.
1 comments

> Almost nothing, except in-memory processing, requires 64-bit for dealing with files over 2GB. Piping with utilities is the Unix way, and it works well with files of any size in Cygwin.

Even ls, or wc -c report bogus results with >2GB files. less does not work even if I want to look at just the first few hundreds of lines (and "head -n 1000 | less" is a horrible workaround).

> Cygwin commands that run slower than Windows counterparts are typically those that are syscall heavy

Most unix commands are syscall/filesystem/IO heavy, after all they are file utilities. What you say with find is exactly what I'm talking about. I find that the unix tools ported to Win32 and compiled with mingw are significantly faster.

Eh? What you state about ls, wc and less is directly contrary to my experience. I'm so astonished I created a 30GB test file and tested it:

     $ cmd /c dir k.txt
     Volume in drive C is CobraRoot
     Volume Serial Number is 02D8-502C
    
     Directory of C:\Users\barrkel\AppData\Local\Temp
    
    2012-07-01  15:24    31,292,160,000 k.txt
                   1 File(s) 31,292,160,000 bytes
                   0 Dir(s)  142,087,471,104 bytes free
    
    $ du -h k.txt
    30G     k.txt
    $ ls -l k.txt
    -rw-r--r--+ 1 barrkel None 31292160000 Jul  1 15:24 k.txt
    $ wc -c k.txt
    31292160000 k.txt
    $ time wc -l k.txt
    6400000000 k.txt
    
    real    1m5.651s
    user    0m46.207s
    sys     0m8.642s
66 seconds to read 30GB isn't too bad, that's over 400MB/sec. (It's an SSD.) When I said directory listings could be slow, I meant directory listings, not general I/O; simple read() and write() do not need translation (provided you aren't using Cygwin text-mode mount options, which are not recommended).

    $ less k.txt
this works just fine; when I do > to go to the end of the file, it goes there immediately, but stays busy calculating line numbers (it's scanning the whole file); if I cancel with Ctrl+C, it stops, just like it does on other Unix OSes.

PS: It's the mingw tools that don't work properly! I tried it a couple of times, but all the incompatibilities made me give up pretty quickly.

Thanks, that's very interesting, maybe I should give cygwin another try. Last time it was a couple of years ago and I had all the mentioned problems, then I decided to wait until a 64bit version before trying it again...
If I had to guess, I'd say somehow you ended up with text-mode mounts in your previous experience. The default, and recommended, is binary mode, but you're given a choice on install. It affects C programs that specify "t" to fopen() and friends, and causes Cygwin to convert line endings to and from DOS. But it's more trouble than it's worth.