Hacker News new | ask | show | jobs
by hawski 815 days ago
I know this about Unix pipes from a very long time. Whenever they are introduced it is always said, but I guess people can miss it.

Though now I will break your mind as my mind was broken not a long time ago. Powershell, which is often said to be a better shell, works like that. It doesn't run things in parallel. I think the same is to be said about Windows cmd/batch, but don't cite me on that. That one thing makes Powershell insufficient to ever be a full replacement of a proper shell.

2 comments

Not exactly. Non-native PowerShell pipelines are executed in a single thread, but the steps are interleaved, not buffered. That is, each object is passed through the whole pipeline before the next object is processed. This is non-ideal for high-performance data processing (e.g. `cat`ing a 10GB file, searching through it and gzipping the output), but for 99% of daily commands, it does not make any difference.

cmd.exe uses standard OS pipes and behaves the same as UNIX shells, same as Powershell invoking native binaries.

Oh, that's what I missed! I managed to find out about it while trying to do an equivalent of `curl ... | tar xzf -` in Powershell. I was stumped. I guess the thing is that a Unix shell would do a subshell automatically.
> Though now I will break your mind as my mind was broken not a long time ago. Powershell, which is often said to be a better shell, works like that. It doesn't run things in parallel. I think the same is to be said about Windows cmd/batch, but don't cite me on that. That one thing makes Powershell insufficient to ever be a full replacement of a proper shell.

A Pipeline is PowerShell is definitely streaming unless you accidentally forces the output into a list/array at some point, e.g. try this for yourself (somewhere you can interrupt the script obviously as it's going to run forever)

    class InfiniteEnumerator : System.Collections.IEnumerator
    {
        hidden [ulong]$countMod2e64 = 0

        [object] get_Current()
        {
            return $this.countMod2e64
        }
        
        [bool] MoveNext() {
            $this.countMod2e64 += 1
            return $true
        }
        
        Reset() {
            $this.countMod2e64 = 0
        }

    }

    class InfiniteEnumerable : System.Collections.IEnumerable {
        InfiniteEnumerable() {}
        
        [System.Collections.IEnumerator] GetEnumerator() {
            return [InfiniteEnumerator]::new()
        }
    }

    [InfiniteEnumerable]::new() | ForEach-Object { Write-Host "Element number mod 2^64: $_" }
Whether it runs in parallel depends on the implementation of each side. Interpreted powershell code does not run in parallel unless you run it a job, use ForEach-Object -Parallel, or explicitly put it on another thread. But the data is not collected together before being sent from one step from the next.
More compact example (not to scare the POSIX people away :) ):

    0..1000000 | where {$_ % 10 -eq 0} | foreach {"Got Value: $_"}
The streaming behavior of the range operator is weird though. This is tested on PowerShell 7.4.1

    > 0..1000000000 | % { $_ }
    # Starts printing out numbers immediately
    > 0..1000000000
    # Hangs longer than I had patience to wait for
    > $x=0..100
    > $x.GetType()
    # IsPublic IsSerial Name     BaseType
    # -------- -------- ----     --------
    # True     True     Object[] System.Array
It's an array when I save it in a variable, but it's obviously not an array on the LHS of a pipe.