Hacker News new | ask | show | jobs
by rtpg 4156 days ago
Isn't this a pretty good argument against shell scripts? I feel like we've advanced far enough in PL research to think of something a bit safer
4 comments

Time for advertisement! You might like elvish https://github.com/elves/elvish which proudly has optional typing and much more well-defined semantics than say, bash. This is work in progress though.

Advertisement aside, there is some inherit unsafety in shell scripts that cannot be easily resolved, namely the unsafety involved in interacting with external commands.

Compared to other scripting languages, the greatest advantage of shell languages is the convenience of interacting with external programs. However, at least in Unix, there are few static constraints you can apply to them. Everything we know is that the program will (probably) parse something in argv which are just bytes, (probably) take something from stdin which are just bytes, and (probably) put something to stdout which are again just bytes; there is no universal method to check that the commands arguments are well-formed, or the input format is correct, or the output format conforms to a certain schema without running the actual program. A solution is to define some kind of static protocols for external programs so that their invocations can be statically checked, but it's already too late.

Interesting! On first glance, this might be the most appealing attempt to improve on the shell I've seen yet. It's a really hard design space. Some questions:

Why is set necessary? Once you have declaration with var, can't mutation be done without set?

What's your thinking behind making var mandatory for declarations? Safety is obvious, but it seems like terseness is a really big goal for shell programming, especially interactive use.

Also, documentation wise, I don't see how/if you do variable expansion in strings. Same as sh?

var is for declaration, set for assignment. This is an important contrast that some dynamic languages miss; ironically JavaScript got it right. Contrast this

    var $x = "foo"; if $true { set $x = "bar" }; echo $x # outputs "bar"
with

    var $x = "foo"; if $true { var $x = "bar" }; echo $x # outputs "foo"
The declaration/assignment contrast is very important when it comes to closures (and there are closures in elvish). In python 2, for instance, there is no way (!) to assign to outer variables in closures since `=` declares and assigns at the same time in a `def` block.

There are no variable expansions, but strings are concatenated implicitly when they run together. In sh:

    echo "hello $name, welcome!"
In elvish:

    echo "hello "$name", welcome!"
Implicit concatenation can read a bit weird at first, but it's actually conceptually much simpler and only slightly more cumbersome than string interpolation. It also makes the syntax much simpler.
JavaScript didn't really get it right:

  var a = 1;

  function four() {
    if (true) {
      var a = 4;
    }

    alert(a); // alerts '4', not the global value of '1'
  }
Also, if you omit 'var', the code is still legal (except in strict mode), and the variable winds up in the global scope, which is a recipe for disaster.

Still, it's nice that 'var' exists in JS at all. The idea that variable declarations are unnecessary noise and should be elided -- an idea that dates back at least to BASIC -- is, in my opinion, one of the worst seductive ideas in programming language design. Unless your language has only a single global scope (like BASIC), it always causes problems -- and we know that block scope is important for nontrivial programs.

Elvish sounds interesting; I will have to check it out.

I don't think we know that. We know some notion of lexical scope is valuable, but function only vs block scope seems like an issue of familiarity/style.
Okay, but that doesn't change my point that a single global scope is archaic, and for good reason.
So I see why you want var, but can't you just treat "x = 5" as "set x = 5"?

You can also remove the need for var (at the expense of no safety for typos) by prohibiting shadowing, like coffeescript does.

The need for `set` has to do with syntax. In shells the first word of a statement is always considered to be the command, so "x = 5" will not work - it reads "execute command 'x' with arguments '=' and '5'". The traditional solution is to treat the command as an assignment when it contains '=', so you write "x=5" and you are prohibited from adding any spaces, which I find aesthetically very unpleasant.

I was not aware of the CoffeeScript approach towards shadowing before. I will look into it, but it seems to be a very controversial design choice of CoffeeScript.

Re coffeescript and shadowing: I don't like it because I dislike the mismatch in semantics with JavaScript and am used to Python, but beyond that, I'm not sure it's wrong.

Re set: it would slightly complicate your grammar, but I don't think detecting "word space* = ..." would create any ambiguities.

Partially this issue is just about how much you value explicitness/regularity vs. concision.

@hyperpape we seem to have hit the critical level for flame war and now I cannot reply to you :) it's said I will be able to reply after some cooldown time, but here is my reply:

re "word space* = ...": should this echo an equal sign and $ip, or assign $ip to $echo?

    echo =$ip
I think a decent solution is to have a shell alternative that defines interfaces for known programs (much like autocomplete scripts do now).

I know ls returns a list of files, so I should be able to use that. I don't know foo, so it's basically string -> string or whatever, but if an entrepreneuring spirit does know about foo, he could write an abstraction layer for it.

The trick is making a simple interface for that

Yes, one thing that can be done (in future) with elvish is writing wrappers for external commands that run the commands and convert their (bytes) output to strongly typed values.

For instance, `ls` outputs a bunch of lines where each line is supposedly a single file name, but this breaks when some file name contains a `\n`. It is possible to use `ls -b` to escape special characters, but now you have to un-escape the filenames when you pass them to other commands. With elvish it is possible to write a wrapper around `ls` that actually outputs a list (yes there are lists in elvish) of strings and each member of the list can be passed around without un-escaping.

Also there's a question about whether you can properly deal with option hell. Parsing every possible output of ls reliably in the face of malicious filenames sounds...fun.

Edit: It's really impossible to avoid edge cases. Take find: you can't parse it, because it's just a list of filenames separated by \n. But filenames can contain just about any character. How do you handle /home/bar\n/tmp?

Maybe you just ignore pathological input, but now you're regressing towards the state of bash.

Option hell is indeed a problem.

The problem with `find` happens to have a solution (-print0). However it is a PITA in deal with \0-separated strings in traditional shells, unless you pipe it to another command that happens to recognize \0-separated strings.

With elvish you can parse the \0-separated strings outputted by `find ... -print0` into a genuine list - not lines (which are \n-separated strings) or \0-separated strings, but real lists that support indexing, iteration, etc. and there is absolutely no chance that two consequent items will run together or one item will be treated as two. Imagine how fantastic it is to deal with that :)

Well, there's plenty of legacy code laying around that may be easier to improve and fix than completely rewrite. In fact, I just presented this tool to my co-workers as a suggestion to clean over 150KLOC of shell scripts we have laying around.
Be interesting to run this checker on the default scripts that come with major UNIX OSes like MacOSX and Ubuntu and Fedora and the like - sounds like a great janitorial project...
I've been using this and it's pretty nice. See a real world example in this pull request:

https://github.com/Gabriel439/Haskell-Turtle-Library/commit/...

The Xerox PARC and ETHZ answer to that would be REPL instead of shell.
A shell is a REPL. The problem is that the shell is sloppy in that it only deals with bytes and cannot process any complex data structure.

Again, advertisement for my side project https://github.com/elves/elvish, a Unix shell with true data structures. Still a WIP though.

There is already a well-developed shell with rich data structures and a fairly reasonable programming language: Microsoft's PowerShell. Sadly it is not a Unix shell. You're probably aware of it, but if not, check it out for design inspiration.
Of course! PowerShell definitely has a lot of brilliant ideas. Sadly it is overenginnered and has quite some design mistakes. Nevertheless it has served as a great source of inspiration for me - I have actually gone through several PowerShell manuals before I started elvish.
As someone who loves PowerShell and uses it daily, may I ask for specifics for design mistakes and overengineering? You may also answer per mail if you want.

Don't get me wrong, I realise it has its flaws and warts, but for me, and comparing to cmd or bash I still think it's very, very much an improvement.

Off the top of my head actual mistakes (the sort that tends to bite many people) include handling of [ and ] in -Path arguments (necessitating -LiteralPath arguments in later versions), and the constant wondering whether something returns a scalar or an array (and an array of one element being unwrapped into a scalar automatically). During my time working on Pash I also noted a few weirdnesses on source code side, most recently and notably LanguagePrimitives.Convert which has a dependency on the currently-executing runspace (which is stored in a thread-local field).

> A shell is a REPL.

Only when it allows the same expressive power over the OS as Lisp Machines, Interlisp-D, Cedar, Oberon have over the running environment.

The only mainstream modern shell that approaches that is Powershell.

Edit: forgot to say good luck for your project

You don't need support for complex data types for a shell to be described as "REPL".

REPL is just a Turing-complete real time interpreter. Which means even the VBA "Immediate" panel in (as seen in MS Office) is REPL. And it means Bash is REPL too.

The question you're raising is whether all REPLs are equal. Lisp machines definitely had more control over the host than VBA does. But that doesn't mean that VBA's immediate panel isn't REPL just because a more powerful example exists.

As for Bash, that's a bit of a weird one because Bash wouldn't be much without the accompanying GNU / POSIX userland. But if you're willing to include a UNIX / Linux userland into scope then Bash has just as much control over the host as Lisp did on Lisp machines. But even without the aid of forking additional executables, Bash can still modify the state of the kernel directly. eg

    echo 0 > /proc/sys/vm/swappiness
    echo 3 > /proc/sys/vm/drop_caches
(For those who may not have been aware, echo is a built in command in Bash)
You are missing the part about manipulating other applications or controling GUI elements, like Powershell kind of allows via DLL interop and OLE Automation.

As for /proc/sys, not all UNIXes have such features.

In Oberon I could pipe selected text from any application into any command that had a GUI aware type signature, for example.

> You are missing the part about manipulating other applications or controling GUI elements, like Powershell kind of allows via DLL interop and OLE Automation.

There are lots of command line hooks for GUIs. Want to copy data to the clipboard from the command line? xclip. Want to pop up a notification in your desktop environment's notification bar? notify-send "hello world!" etc

> As for /proc/sys, not all UNIXes have such features.

That example of mine was clearly taken from Linux - so it goes without saying that most UNIXes would behave different in that specific regard. Even so, they'd still have command line tools for doing the same thing (and to be fair, Linux does too, even with a vaguely-Plan 9 virtual file system)

> In Oberon I could pipe selected text from any application into any command that had a GUI aware type signature, for example.

Well like I said, I'm not trying to say that all REPL's are equal, but most of what you're describing is still possible in at least Linux. I'm not saying it's as intuitive nor "pretty" as it would have been on the Oberon, but it's certainly possible.

To be quite honest, most of what you've been posting on this topic has really just been elitism. And I do actually sympathise with your point as working in Bash can be a complete hateful mess at times (even without comparing it to the old Lisp machines). But that doesn't change the fact that Bash is a REPL environment.

> it only deals with bytes and cannot process any complex data structure.

Just FYI. Microsoft tried to address that problem in Windows a long time ago when it introduced Powershell.

Does Powershell provide a 'datatyping facility' for the contents of Files? or is it just values returned from Powershell defined objects or methods?
Contents of files are, depending on how you read them either a byte[], a string or a list of strings (lines). You can run them through parsers for JSON, XML, CSV or whatever else is handy to get actual objects. In the CSV case there are even cmdlets that work directly with files (Import-Csv, Export-Csv), for XML I usually use [xml](gc file), more general there are the ConvertFrom-* and ConvertTo-* cmdlets, e.g. for JSON and CSV.
See my reply to omaranto.