Hacker News new | ask | show | jobs
by xiaq 4156 days ago
Time for advertisement! You might like elvish https://github.com/elves/elvish which proudly has optional typing and much more well-defined semantics than say, bash. This is work in progress though.

Advertisement aside, there is some inherit unsafety in shell scripts that cannot be easily resolved, namely the unsafety involved in interacting with external commands.

Compared to other scripting languages, the greatest advantage of shell languages is the convenience of interacting with external programs. However, at least in Unix, there are few static constraints you can apply to them. Everything we know is that the program will (probably) parse something in argv which are just bytes, (probably) take something from stdin which are just bytes, and (probably) put something to stdout which are again just bytes; there is no universal method to check that the commands arguments are well-formed, or the input format is correct, or the output format conforms to a certain schema without running the actual program. A solution is to define some kind of static protocols for external programs so that their invocations can be statically checked, but it's already too late.

2 comments

Interesting! On first glance, this might be the most appealing attempt to improve on the shell I've seen yet. It's a really hard design space. Some questions:

Why is set necessary? Once you have declaration with var, can't mutation be done without set?

What's your thinking behind making var mandatory for declarations? Safety is obvious, but it seems like terseness is a really big goal for shell programming, especially interactive use.

Also, documentation wise, I don't see how/if you do variable expansion in strings. Same as sh?

var is for declaration, set for assignment. This is an important contrast that some dynamic languages miss; ironically JavaScript got it right. Contrast this

    var $x = "foo"; if $true { set $x = "bar" }; echo $x # outputs "bar"
with

    var $x = "foo"; if $true { var $x = "bar" }; echo $x # outputs "foo"
The declaration/assignment contrast is very important when it comes to closures (and there are closures in elvish). In python 2, for instance, there is no way (!) to assign to outer variables in closures since `=` declares and assigns at the same time in a `def` block.

There are no variable expansions, but strings are concatenated implicitly when they run together. In sh:

    echo "hello $name, welcome!"
In elvish:

    echo "hello "$name", welcome!"
Implicit concatenation can read a bit weird at first, but it's actually conceptually much simpler and only slightly more cumbersome than string interpolation. It also makes the syntax much simpler.
JavaScript didn't really get it right:

  var a = 1;

  function four() {
    if (true) {
      var a = 4;
    }

    alert(a); // alerts '4', not the global value of '1'
  }
Also, if you omit 'var', the code is still legal (except in strict mode), and the variable winds up in the global scope, which is a recipe for disaster.

Still, it's nice that 'var' exists in JS at all. The idea that variable declarations are unnecessary noise and should be elided -- an idea that dates back at least to BASIC -- is, in my opinion, one of the worst seductive ideas in programming language design. Unless your language has only a single global scope (like BASIC), it always causes problems -- and we know that block scope is important for nontrivial programs.

Elvish sounds interesting; I will have to check it out.

I don't think we know that. We know some notion of lexical scope is valuable, but function only vs block scope seems like an issue of familiarity/style.
Okay, but that doesn't change my point that a single global scope is archaic, and for good reason.
So I see why you want var, but can't you just treat "x = 5" as "set x = 5"?

You can also remove the need for var (at the expense of no safety for typos) by prohibiting shadowing, like coffeescript does.

The need for `set` has to do with syntax. In shells the first word of a statement is always considered to be the command, so "x = 5" will not work - it reads "execute command 'x' with arguments '=' and '5'". The traditional solution is to treat the command as an assignment when it contains '=', so you write "x=5" and you are prohibited from adding any spaces, which I find aesthetically very unpleasant.

I was not aware of the CoffeeScript approach towards shadowing before. I will look into it, but it seems to be a very controversial design choice of CoffeeScript.

Re coffeescript and shadowing: I don't like it because I dislike the mismatch in semantics with JavaScript and am used to Python, but beyond that, I'm not sure it's wrong.

Re set: it would slightly complicate your grammar, but I don't think detecting "word space* = ..." would create any ambiguities.

Partially this issue is just about how much you value explicitness/regularity vs. concision.

@hyperpape we seem to have hit the critical level for flame war and now I cannot reply to you :) it's said I will be able to reply after some cooldown time, but here is my reply:

re "word space* = ...": should this echo an equal sign and $ip, or assign $ip to $echo?

    echo =$ip
You're right. I knew at some point I'd make a bad assumption based on my own (limited) forays into writing a shell language.

In my case, I'm treating "=" as not able to be included in an unquoted string literal, and I'm requiring variables to start with $.

So in my shell, this would be a syntax error.

    echo =$ip
This works.

    $echo = $ip
If you relaxed the variable naming idea, it would set echo to $ip, but that's probably a bad idea.
I think a decent solution is to have a shell alternative that defines interfaces for known programs (much like autocomplete scripts do now).

I know ls returns a list of files, so I should be able to use that. I don't know foo, so it's basically string -> string or whatever, but if an entrepreneuring spirit does know about foo, he could write an abstraction layer for it.

The trick is making a simple interface for that

Yes, one thing that can be done (in future) with elvish is writing wrappers for external commands that run the commands and convert their (bytes) output to strongly typed values.

For instance, `ls` outputs a bunch of lines where each line is supposedly a single file name, but this breaks when some file name contains a `\n`. It is possible to use `ls -b` to escape special characters, but now you have to un-escape the filenames when you pass them to other commands. With elvish it is possible to write a wrapper around `ls` that actually outputs a list (yes there are lists in elvish) of strings and each member of the list can be passed around without un-escaping.

Also there's a question about whether you can properly deal with option hell. Parsing every possible output of ls reliably in the face of malicious filenames sounds...fun.

Edit: It's really impossible to avoid edge cases. Take find: you can't parse it, because it's just a list of filenames separated by \n. But filenames can contain just about any character. How do you handle /home/bar\n/tmp?

Maybe you just ignore pathological input, but now you're regressing towards the state of bash.

Option hell is indeed a problem.

The problem with `find` happens to have a solution (-print0). However it is a PITA in deal with \0-separated strings in traditional shells, unless you pipe it to another command that happens to recognize \0-separated strings.

With elvish you can parse the \0-separated strings outputted by `find ... -print0` into a genuine list - not lines (which are \n-separated strings) or \0-separated strings, but real lists that support indexing, iteration, etc. and there is absolutely no chance that two consequent items will run together or one item will be treated as two. Imagine how fantastic it is to deal with that :)