Hacker News new | ask | show | jobs
by BoppreH 1029 days ago
It's a nice tool, but it also shows the shortcomings of shell commands.

In a proper programming language, we'd have something like

    parallel [1..5], i => { sleep random()*10+5; possibly_flaky i }
    // [{"Seq": 4, "Host": ":", "Starttime": 1692491267...
And `parallel` would only have to worry about parallelization.

Instead, the shell environment forces programs to invent their own parameter separator (:::), a templating format ({1}), and a way to output a list of structures (CSV-like). You can see the same issues in `find`, where the exec separator is `\;`, the template is `{}`, and the output is delimited by \n or \0. And `xargs` does it in yet another different way.

It's very hard to acquire and retain mastery over a toolbox where every tool reinvents the basics. If you ever found yourself searching "find exec syntax" multiple times in a week, it's not your fault.

As for alternatives, I'm a fan of YSH[1] (Javascript-like), Nushell[2] (reinvented from first-principles for simplicity and safety) and Fish[3] (bash-like but without the footguns). Nushell is probably my favorite from the bunch, here's a parallel example:

    ls | where type == dir | par-each { |it|
        { name: $it.name, len: (ls $it.name | length) }
    }
[1] https://www.oilshell.org/release/latest/doc/ysh-tour.html

[2] https://github.com/nushell/nushell

[3] https://fishshell.com/

5 comments

[I'm not recommending this, but maybe… No, no. I'm not sure…]

It isn't even just the newer shells that have solved this, zsh also has a solution out of the box¹. The extensive globbing support in zsh can largely replace `find`, and things like zargs allow you to reuse your common knowledge throughout the shell.

For example, performing your first example with zargs would use regular option separators(`--`), regular expansion(`{1..5}`), and standard shell constructs for the commands to execute.

I'll contrive up an example based around your file counter, but slightly different to show some other functionality.

    f() { fs=($1/*(.)); jo $1=$#fs }
    zargs -P 32 -n1 -- **/*(/) -- f
That should recursively list directories, counting only the files within each, and output² jsonl that can be further mangled within the shell². You could just as easily populate an associative array for further work, or $whatever. Unlike bash, zsh has reasonable behaviour around quoting and whitespace too.

Edit to add: I'm not suggesting zargs is a replacement for parallel, but if you're only using a small subset of its functionality then it may be able to replace that.

¹ https://zsh.sourceforge.io/Doc/Release/User-Contributions.ht...

² https://github.com/jpmens/jo

³ https://github.com/stedolan/jq

What you mention is the main reason why shell script is not a decent language to write long programs. It is full of inconsistencies, and since it depends on other commands, you have to learn the quirks of each command you use. Moreover, good luck if you need to debug this. Shell should only be used for small scripts that are easy to debug.
If doing even simple things requires looking up documentation, why does it matter whether the shell script is long or short?

Spending extra time doing simple things — because you need to Google e.g. "how to pass multiple space-separated arguments from a string to a command" — is also a waste of time.

Because the shell is available everywhere. A programming language needs to be installed first to be of any use. I still understand that there is a need for a simple command processor that is independent of a programming language for simple tasks.
> A programming language needs to be installed first to be of any use.

I have that taken care of already

This works for you, but many people don't have the luxury of knowing which languages are installed in another system.
Do you recommend any good alternative when your shell program gets too large?

Honest question, as I’m struggling to leave the shell environment once the program gets too large. I could use Perl, but $? and the likes get quickly out of hand. Python’s support for pipes was difficult last time I used it, but that may have changed. What would you recommend?

If it's too large, then just write normal Python code. It'll be a lot longer than the equivalent shell-like script, but you'll gain it back in maintenance effort, debugabillity, and robustness.
You've some hesitation with Perl, but if you stick at it, you'll find what you seek. It feels very 'unixy' and can achieve much the same as shell while being more consistent in its syntax. Its portability means it will work the same across environments. Plus the newest editions have niceties like modern classes and try/catch as inbuilt language features.

Sharing this because its the route I went, anything I'd have written in Bash I'd now do in Perl.

Thank you for encouraging me to use Perl. After Perl 6 came out I got confused at what and how to use and hence I’ve abandoned that path. I’ll try once more now.
The tooling around Perl has also gotten better over the last decade or so while also allowing you to pack everything to run on even ancient machines running old Perl 5.
On Unix, you might try Nim (https://nim-lang.org) with https://github.com/Vindaar/shell and there are a slew of pipeline-y/Unix-y utilities over at https://github.com/c-blake/bu

Nim is statically typed and (generally) native-compiled, but it has very low ceremony ergonomics and a powerful compile-time macro/template system as well as user-defined operators (e.g., you can use `+-` to make a constructor for uncertain values so that `9 +- 2` builds a typed object as in https://github.com/SciNim/Measuremancer .

Thanks for pointing me to nim, it looks promising. I‘ll try to use https://nim-lang.org/docs/osproc.html to pipe programs.

My use case is approx. like this: I can get 80% what I want with ls … | sed … | grep -v … but then it gets complicated in the script and I’d like to replace the sed or grep part with some program.

This sounds like a job for what standard C calls "popen". You can do

    import posix; for line in popen("ls", "r").lines: echo line
in Nim, though you obviously need to replace `echo line` with other desired processing and learn how to do that.

You might also want to consider `rp` which is a program generator-compiler-runner along the lines of `awk` but with all the code just Nim snippets interpolated into a program template: https://github.com/c-blake/bu/blob/main/doc/rp.md . E.g.:

    $ ls -l | rp -pimport\ stats -bvar\ r:RunningStat -wnf\>4 r.push\ 4.f -eecho\ r
    RunningStat(
      number of probes: 26
      max: 31303.0
      min: 23.0
      sum: 84738.0
      mean: 3259.153846153846
      std deviation: 6393.116633069013
    )
Thank you! Popen looks like what I was looking for!
I use Go. You can run scripts with go run directly, and this package makes shell tasks easy: https://github.com/bitfield/script
Script looks promising, thank you! I’ll give it a try, as some sister comment also suggests Go.
Good ergonomics for Perl-style quick and dirty text processing were part of the original design goals for Ruby. Those parts of the language are still there. You can write code that feels more concise than Python yet, IMO, tends to be more readable/maintainable than Perl can stereotypically be. Modern style guides, however, de-emphasize that style of Ruby since it might not be the most appropriate in the context of say a large Rails project.
Powershell uses proper objects instead of stringly nonsense.
Perl6/Raku is my personal choice (when I can).

Python to me, is too far away from shell/unix. It is a programming language for writing applications. For the use case of writing shell scripts but in a more powerful language, perl is still the king here (or it should be. Sadly it doesn't appear to be the case. No one is using it except for die hard gray beards.)

Raku is a modern (still a big) language with kitchen sink. Again doesn't appear to be much uptake.

Elk is a Shell language with syntax similar to Python. https://elk.strct.net
This looks very interesting! Thank you!
Unpopular opinion, but I used Haskell "scripts" with relative success for a while. Stack has a nice script interpreter mode that is runnable in the familiar #! way.

Even allows to add dependencies and if necessary compile the script on the fly.

I once read a HN thread that recommended Go for this, and it made me interested. I think it was a useful suggestion, it made me learn Go, and I kind of agree with it, 5+ years after. It is not a shell, but it is simple and fast and useful.
Python.

Just the inclusion of argparse alone is worth it IMO.

> Python’s support for pipes was difficult

Well, the idea would be to replace a lot of your pipe usage.

Off the wall, but Scala has a concise syntax for process operations, but startup time is likely prohibitive.

GNU Parallel is also based on perl, so the footprint is quite large.

GNU xargs implements limited parallelization, and is compiled C. This functionality is present within busybox, including the Windows version.

https://www.linuxjournal.com/content/parallel-shells-xargs-u...

GNU Parallel will have much greater functionality, but it will not reach as far as xargs.

> GNU Parallel is also based on perl

Time to rewrite it in Rust /s

:p

There have been multiple ports already, I believe.
meanwhile, python DASK is very well funded to be cloud-native, and also local.. however it relies on a python runtime, so you know .. also not sure about the DASK license terms
Your find exec problem can be trivially solved with either - exec /bin/bash -c "script" or you can spend a little extra time figuring out how to properly structure your scripts in such a way where the incocations just flow with little more than an invocation +getopts

If you feel like the answer is rewriting the shell, the answer is practically never rewriting the shell. It's learning to use it.

Do you mean like:

    parallel 'sleep {= $_=rand()*10+5; =} ; possibly_flaky {}' ::: {1..5}
The {= =} escapes to perl, so you have a full programming language available.