Hacker News new | ask | show | jobs
Language for Unix command line utilities?
6 points by gn 5765 days ago
I code for a small biocomputing company. We download nucleotide sequence and taxonomy information in a number of unrelated formats from a number of public repositories and run various kinds of translation and analysis on it. Much of what we write are small command line tools that search, summarize, or transform certain types of (large trees of) text files. These programs look and feel a lot like traditional core Unix utilities; our most widely used programs are essentially just a specialized version of diff and a specialized version of grep, respectively. We used to prototype most of our utilities as shell scripts or in Perl; we redid shell scripts in Perl or C (or sometimes Java) if they became performance bottlenecks.

Some years ago we decided to move from Perl to Python for new projects because Perl programs had a way of always ending up as maintainability nightmares and because Perl seemed on the way out anyway. It largely worked, but we were never really, truly happy with our Python code. I suspect part of the reason is that Python can be (or at least feel) less succinct than even C if you do lots of low-level file system stuff with close error checking. The true reason is probably largely aesthetic. We can't explain what's wrong, we're just vaguely uneasy.

What other alternative to C should we be looking at? Ruby? Haskell? Is Go there yet? We have very open minds and are willing to consider pretty much anything that gives us reasonably easy and unmolested access to syscalls and their return values.

6 comments

I do a lot of command-line-tool-writing, my language at the moment is Haskell, tho' I still count myself as a Haskell beginner, it's proving to be very productive. The old adage that if Haskell code compiles it works is mostly true; bugs are caught up-front rather than after running in the wild for a bit; type inference, explicit pure/IO and functional composition are real boons. I need to build more familiarity with the libraries before I can be as productive in the short term (e.g. for "one offs") as I am in Python but already I believe in the long term (because one-offs never are!) I'm pulling ahead.

OCaml would be a good choice too. Both of these languages work very naturally with tree-like structures. Profiling/code coverage in both is very easy. IMHO there's no need to go to C for any but the most performance-critical code (and remember that your I/O etc is already in C in the kernel). The C approach of checking the return value of every syscall (e.g. no exceptions) is very cumbersome.

Case in point today: rather than persuade our Unix guys to roll out Expect across a bunch of new machines, I rewrote a ~200 line Expect script I had in ~60 lines of Haskell and deployed a binary instead of a script.

On OCaml, http://ocaml.janestreet.com/ might interest the OP.
Only thing I can think of is AWK, but that's only slightly more readable than perl and is probably less maintainable since perl has vastly superior profiling and debugging tools.

I mostly use Python for the sorts of things you are mentioning. And from what you're saying you don't like a bout Python, I suspect that going to Ruby or Haskell or such is going to be worse. Python can more easily call the underlying C routines then either of those.

It would be nice if you could provide an example of something you think is inelegant and/or awkward in Python so that we could figure out which direction to point you.

I do a lot of coding in common lisp and some programming in haskell, but wouldn't recommend either of those based on what I've heard from you so far. There's a few dataflow style languages I've seen that would probably allow very succinct code, but they were all toys and performed quite poorly.

> It would be nice if you could provide an example

For me personally the main source of unhappiness is error messages. In C I can say

if (!(f = open(name, "r"))) die(name);

where die is a tiny function that prints name, followed by whatever strerror has to say to the subject, formatted in the usual fashion. One line, done with it. The obvious, conventional Python equivalent is four lines long because both try: and except: insist on a line of their own. Since I cannot tell Python to produce succinct unixy error messages instead of rambling stack traces I have to catch and examine more or less every plausible exception. Some exceptions I can deal with close to the base level of my call stack in a butt-ugly fourty-line catch-all clause but a large proportion of my syscalls end up taking three lines extra each. I know it's a trivial problem, but I agree with pg you tend to get the more productive the more of your actual application logic you can see.

> I do a lot of coding in common lisp

We did experiment with clisp a while back; it turned out not to be a natural fit for problems that involve a lot of pathname, datetime, and stat info manipulation. If there was a reasonably modern Lisp that let me say things like (localtime (nth 9 (stat "/foo"))) I would go looking for it this very afternoon.

> Since I cannot tell Python to produce succinct unixy error messages instead of rambling stack traces

`sys.excepthook` is how you can do that.

Without:

     x = {}
     def f():
         print(x['foo'])
     f()
     # ...

      Traceback (most recent call last):
	File "stack.py", line 6, in <module>
	  f()
	File "stack.py", line 4, in f
	  print(x['foo'])
      KeyError: 'foo'
With:

     import sys

     def short_err(exc_type, exc, tb):
         sys.stderr.write("error: tracebacks too long\n")

     sys.excepthook=short_err

     x = {}
     def f():
         print(x['foo'])
     f()

     #...

     error: tracebacks too long
So don't worry about catching exceptions if you're just printing errors.
> `sys.excepthook` is how you can do that.

Awesome. Thank you kindly.

> error: tracebacks too long

I like your style.

> ... Python can be (or at least feel) less succinct than even C ....

That's an interesting statement. Certainly, Python can be less succinct than Perl, particularly for small scripts where a quick "while(<>) {" and a regexp get most of your work done. But C??

> ... if you do lots of low-level file system stuff with close error checking.

Hmmm. In my experience, C's I/O libraries tend to make error checking something we leave by the wayside. Is there any chance that the real reason your Python scripts are longer, is that you actually check for, and properly handle, the errors there, while in C, you often don't?

In any case, I'll echo a comment from aidenn0:

> It would be nice if you could provide an example of something you think is inelegant and/or awkward in Python so that we could figure out which direction to point you.

With a modern version of Perl and the autodie pragma active (part of Perl 5.10.1), your dissatisfaction with verbose handling can often simply disappear.
Python is just fine. O'Reilly have a great book "Python for Unix and Linux System Administration" if you'd like some great suggestions and ideas.
Go is certainly there. It even has an easy to use Syscall module.

See the File example http://golang.org/doc/progs/file.go?h=syscall