Hacker News new | ask | show | jobs
by saidajigumi 4272 days ago
The author specifically calls out that he's not talking about programming, per se. He's talking about the skill set of wrestling useful free software packages to one's own aims:

So perhaps what is more important to a researcher than programming ability is adeptness at dealing with command-line bullshittery, since that enables one to become 10x or even 100x more productive than peers by finding, installing, configuring, customizing, and remixing the appropriate pieces of free software.

I'm torn about this article. Clearly this researcher, in his role as mentor, has identified a skill gap that's hindering his students. And it's perhaps even a problem that the software community can ease the pain of. But many of the things he lists in passing get down to fundamental tools of software work: version control, package management, data manipulation, etc. Yes, the usage of these things on the command line tends to be "arcane", but that's because each is encoding its own problem domain. And if you're going to be working in software in any non-ivory-tower capacity, you'd better know this stuff.

I've dealt with this kind of problem numerous times before in various contexts with workflow tooling. I.e. a single (usually) command-line tool that neatly encapsulates the most common development use cases to reduce learning curves, cycle time, and errors. These can be phenomenally successful if done well, but if the context doesn't define a workflow (e.g. student A vs. student B's research ideas) then there's no easy way to encapsulate the user's problems.

5 comments

> Yes, the usage of these things on the command line tends to be "arcane", but that's because each is encoding its own problem domain.

Not necessarily. I came to believe lately that Git for instance, has beautiful, simple, and powerful core principles… and an unacceptably crappy user interface. A hashed DAG of commits with a few pointers to navigate it, that's great. But the magic incantation that you're required to type on the command line are too complex, unintuitive, inconsistent… and intellectually utterly uninteresting.

Git's core model is the interesting part, the one that will make you a better programmer, or computer user, or whatever it is you want to do that involves version control. But the specifics of the command line interface? That's neither interesting nor a portable skill. "Command line bullshitery" is a perfect term to describe it.

Why I believe that has been said better than I can ever do here: http://tonsky.me/blog/reinventing-git-interface/

Seriously, even "end losers" could use this. I also believe this can be generalised: some software just isn't usable through the command line. For day to day interactive use, it needs a neat, special purpose graphical user interface —Bret Victor has taught us how powerful they can be.

The command line is still invaluable when interacting with other software, or for automation. Then it should be designed for those, not for interactive use. Simply put, it should be an API —which you could use to build your wonderful GUI on top of.

Paul Graham has talked about the pain of installing software, and every time I have to do it, I always have trepidation. "Is this going to be the time apt-get barfs at me?"

Software installation is still a big pile of bullshit. For the people who spend their time deep inside one ecosystem, it can be okay, but most people have something to do besides live deep inside one ecosystem.

A few weeks ago I was just trying to find a JavaScript minifier on a Linux VM. So I googled and spent an hour digging through various pieces of crap, incompatible versions of libraries, asinine "gem install" error messages, and fun reading Stack Overflow answers saying things like "why didn't you have lib-foo-fuck-you installed already?"

And none of this is valuable for me to learn because in five years all the current package maintenance stuff is going to be thrown out and replaced. Not necessarily by something better (although that's likely the hope, leading to http://xkcd.com/927/ ).

He also goes on to say,

"Throughout this entire ordeal where I'm uttering ridiculous epithets like 'git pipe fork pipe stdout pipe stderr apt-get revert rollback pipe pipe grep pipe to less make install redirect rm rm ls ls -l ls tar -zxvf rm rm rm ssh mv ssh curl wget pip,'"

In other words, "ridiculous epithets" seems to be equivalent to telling the machine to do something. Have you got a way to get git to control your source without actually invoking git?

Workflow tooling can indeed be incredibly useful, but the context isn't the only requirement for success. If something underpinning that tooling changes or breaks, someone is going to have to understand what happened.

The people who regard that understanding as "ridiculous" are the worst people to work with and to my mind are the primary reasons that this "profession" gets little respect.

> Have you got a way to get git to control your source without actually invoking git?

No, but git does involve ridiculous epithets. No quotes, because I'm dead serious. As an interface, Git command line is laughable, and doesn't deserve a passing grade. Yes, it's the only one we've got. Yes, many interfaces are even worse. Still, that's no excuse. We can do better. Hopefully someone will: http://tonsky.me/blog/reinventing-git-interface/

---

Let's take a simpler example:

  $ tar -xzf foo.tar.gz
So, you have to learn the name "tar". The option "-x" for extract, the option "z" for gzip, and the option "-f" for file (by the way, the "f" must come last, or it won't work). What the fuck is this retarded interface?

First, why do I have to tell tar to extract the thing, since it's obviously a compressed archive? Why do I have to tell tar that it's in gzip format? It can decompress it, surely it can check the compression format? And why, why, WHY do I have to tell it I'm processing a file? It KNOWS it's a freaking file!!!

Surely there must be an alternative, like… like…

  $ decompress foo.tar.gz
I personally don't know of such alternative, and don't use them, because I was retarded enough to learn the basic 'tar' incantations by heart. Now that I know them, I can't summon the courage to use a decent interface instead.

But you see my point. Even for a simple tool such as tar, the common use case involves a cryptic incantation that shouldn't be needed in the first place. I'm sure many other UNIX tools are like that, I just didn't think about critiquing their interfaces thoroughly. Yet.

The core principles of the command line are very good. The level of relatively easy automation it provide is nothing short of amazing. This technology for the 70's is arguably more capable than most graphical interfaces in current use. But it does have its fair share of incidental complexity and useless quirks. We can do better. Let's not forget that.

Lets take your tar example. The -z and -x options are flags, they specify binary on/off options. You can specify all the flags separately on the command invocation like so:

  $ tar -x -z -f foo.tar.gz
However typing -flag -flag2 -flag3 is too many keystrokes so an a convenience you can combine all the flags in one call -xzf. The -f isn't a flag though it takes an argument which in this case is the filename foo.tar.gz. The argument is required and comes directly after the flag. Which is why the f has to come last because that argument has to come right after. Order doesn't matter for the x and z because they don't take arguments they are just flags. It makes sense if you add in another flag like -C which also takes an option you would end up with:

  $ tar -xzfC foo.tar.gz directory_to_change_to
Which argument goes to which flag? Maybe the first flag gets the first argument? Then your argument order changes if you type in the flags backwards.

I don't know about your z flag, GNU tar doesn't need it. The x flag is needed because tar can do things other than extract like list the contents of the archive with the -t flag, or create a new archive with -c.

Finally why is the f command required? My first assumption was that maybe because you need to specify the output file when you are creating an archive. I took a look in the manpage and the reason is a lot more interesting.

  Use archive file or device ARCHIVE.  If this option is not given, tar will
  first examine the environment variable `TAPE'. If it is set, its value will
  be used as the archive name. Otherwise, tar will assume the compiled-in
  default.
I knew that tar's name comes from the phrase tape archive but I hadn't put two and two together. Of course you need to specify if you are writing the archive to a file because tar was created to back up data to tape! If you think about it tar is actually doing the "right thing". Considering why it was written tar has a sane default, write the data to the tape drive.

Maybe you already understand all this and I'm reading too much into your simple example. It feels to me though that when people have issues with something like the unix command line its because they just wanted to get something done and memorized an incantation to do it. There isn't anything wrong with that of course but a tool like tar is SO much more powerful than just decompressing files. Once you start to dig into it though there is an internal consistency and logic to it though.

> Maybe you already understand all this

Yes I do. Every single item. I just feel for the hapless student that is required to send a compressed archive of his work to the teacher, and is using tar for the first time.

There's only one little exception: I didn't know GNU tar doesn't require the '-z' flag (which by the way means 'this is a gzip archive') when extracting tar.gz archive. Anyway, I bet my hat that the '-z' is required if you compress something and output the result to the standard output: there will be no '.gz' hint to help the tool magically understand you want it compressed. If you omit it, tar will likely not compress anything.

The '-f' option is the most aggravating. Nobody uses tapes any more. Tar was doing the right thing, but no longer. -f should be dropped, or rendered optional, or replaced by '-o' for consistency with compilers… As it is, it's just ugly.

> It feels to me though that when people have issues with something like the unix command line its because they just wanted to get something done and memorized an incantation to do it. There isn't anything wrong with that of course […]

Actually there is. The users want to do something (usually a very common case such as compressing or decompressing an archive), then they have to memorise an arcane incantation. Yes, tar can do much more. Yes, the command line is powerful and flexible and more. This is Good. (Seriously, I miss my command line whenever I have to touch Windows.) On the other hand, some common cases are just not well handled, and it is full of idiosyncrasies that have nothing to do with the aforementioned benefits.

When the user wants to decompress files, it should not be more complicated than 'decompress archive.tar.gz'. Though thanks to an uncle comment, I now know of the 'unp' tool, which works just like that: 'unp archive.tar.gz', and you're done. (And the man page is refreshingly short.)

You don't specify -f to tell it you're processing a file, you specify -f to tell it that the next argument is the filename. And it doesn't have to come last.

    tar -z -f foo.tar.gz -x
That's a perfectly valid tar command. Also, obviously you have to tell it that you're extracting the file. How else would it know that you don't want to create an archive?
You're the second commenter who believed I didn't know this stuff.

I know that, and more. But go and explain each and every flag to a student that just want to extract the first lesson of his first UNIX course. At this point, this is all magic and arbitrary.

> $ uncompress foo.tar.gz

Try unp, it's in the repo.

Told you we could do better! Seriously you just made my day. I'll use this from now on.
I think the biggest issue is that beyond the setup phase of development, these tools don't get used by the new researchers the author works with. If I'm developing a new program, having to run ten programs I've never seen before just to get started can be frustrating if I won't be actively using as them I work.

Programmers should learn the tools to stay efficient. Version Control, build tools, etc. are priceless. But if you force feed too much at once, nothing will stick. Couple that with what for some may be their first time on the command line, and you have a recipe for bullshittery.

I'm having a hard time understanding other commenter's grief that explaining git to someone who's never typed "ls" before is anything less than bullshit to slog through. These things are best learned one or two at a time.

> Clearly this researcher, in his role as mentor, has identified a skill gap that's hindering his students.

This is a well known pre-course prep step, part of a bigger to do list for all teachers to make sure the tools you suggest to the students are bundled to be setup in an easy way for your target audience.

I don't think he's talking about courses here. He's talking about his role as an advisor, not lecturer.

This issue comes up all the time with young researchers (i.e. graduate students). There are a huge number of free and open source packages that can help them implement and test their ideas, but actually getting them to work together can be an exercise in yak shaving.

That being said, having facility with command line tools is a valuable skill for any researcher.

Paraphrasing some actual experiences I've had: I want to install GNU Guile 2 on my Mac laptop so that I can write a prototype of an AI program. To install Guile 2 I need to install some prerequisite library. The prerequisite library won't build with the version of GCC I have installed. The easiest way to upgrade GCC is to get the newest version of Apple Xcode tools. The newest version of Xcode tools requires the latest version of OS X. But I also run Avid Pro Tools for music production on this computer, and the latest version of OS X is not clearly compatible with my version of Avid Pro Tools. So I'd need to pay $300 to upgrade Pro Tools so I can upgrade OS X so I can upgrade Xcode so I can upgrade GCC so I can build a library so I can install Guile 2.
Yeah, I've definitely felt that pain as well.

In this case, I'd suggest using a VM driven by Vagrant, unless you really need to be running native under OS X. That provides an isolated and repeatable environment, but at the cost of learning whole other domains of experience. My suggestion also hugely reinforces Dr. Guo's point: we've perhaps solved a problem by adding piles of additional tooling layers: (vagrant CLI, Vagrantfile interface, VM domain knowledge, setting up a Linux host (even as a toy environment), setting up a Linux host as a build environment, etc.) Heck, if we're doing it right it'd be nice to use a provisioning tool to automate the VM and build environment setup. All of this stuff is awesomely powerful, but front-loading a student project with it is nuts unless there's a domain expert who's building this tooling for them and coaching the students through it. Again, that's not the sort of thing that's likely to happen for per-student research projects.

Isn't that simply one of the downsides of choosing to work on OSX?

Or maybe `brew install guile`?