Hacker News new | ask | show | jobs
by ricardobeat 3889 days ago
I (frontend/javascript developer by trade) wholeheartedly recommend learning the standard unix tools - bash, sed, grep, awk, xargs - for these tasks.

While the syntax is daunting at first, and escaping will be an eternal problem, eventually it starts making sense and you'll be able to do anything in seconds. The work in the post becomes trivial:

    mv app/views/foo/**/* app/views/
    sed -i 's/\([a-z]*\)foo_\([a-z]*\)path/\1\2path/' app/views/**/*
or in a more imperative (and probably less efficient) style:

    for file in app/views/foo/**/*; do
        sed -i 's/\([a-z]*\)foo_\([a-z]*\)path/\1\2path/' $file
        new_path=$(sed 's,foo/,,' <<< $file)
        mkdir -p $(dirname $new_path)
        mv $file $new_path
    done
4 comments

I don't do this anymore. I learnt how to use bash, sed, grep, awk, netcat, socat, and as a user, I work in terminals. I accumulated a lot of scripts, to the point where they were the first set of tools I would reach.

But it becomes a self-renforcing loop: the more you rely on this toolbox, the more problems you create that are best solved with the same tools and the particular mindset they promote. But for work-related tasks, throwaway code rarely is: your code becomes part of your infrastructure and you have to make it work reliably now and later.

After some years, things are not so fun anymore: assembling strings means escaping/quoting characters for different formats and languages. With shell scripts, the path of least resistance is unsafe and introduce many unneeded assumptions.

For example, in the above code, globbing in app/views/foo/... in the "for" expression will not work if you have spaces in file names. And if you can get it right, $file is not inside double-quotes; mv will erase any previously existing file. And even though some of the assumptions are valid in context, will they always hold?

In order to make them hold, people tend to adapt their problem to fit their tools, not the other way: file names are always in a well-known ascii range, with no space but a specific separator and zero-padding (e.g. myfile_00020, because files are sorted lexicographically). Unless you are willing to use json/xml tools, data is cut into pieces of strings: records are line-separated; each field is colon-separated, each subfield is comma-separated, etc (recursive CSV). And one day, that field which always contain dates in YY/DD/MM format (why not ISO-8601?) starts having a time too, formatted HH:MM. After a painful recovery of trashed data, you add "just one more script" to properly escape colons.

Languages are designed with specific goals in mind and writing complex programs is not what scripting languages are optimized for. I now prefer to use language with data-structures, functions, objects, namespaces: less use for strings and regexes. If I try to write onto an existing file, I will be warned (or I can explicitely allow overwriting). Paths are organized as os-independant trees, etc: there are better interfaces to the facility provided by the OS. Coincidently, the need for scripts gradually reduced, at least the one I need to save (I still chain programs on the command-line).

Globing works fine with spaces, but you need to quote variables, e.g. $file -> "$file", to avoid parsing of their content by bash. See Bash-FAQ for details.

Problem with shell is that you need to solve same problems again and again. This is why I wrote "bash-modules" script (https://github.com/vlisivka/bash-modules), which allows to write module with common code and then just do ". import.sh module" from script or command line.

Yep, right, I forgot: globs are expanded after IFS word splitting. I retract my comment about globbing not working with spaces, and instead I'll add this one:

    app/views/foo/**/*
If the above does not match any file and you are not lucky enough to use nullglob, your script iterates once in the loop with the value bound to the pattern itself! (I learnt this from http://www.dwheeler.com/essays/filenames-in-shell.html, which is quite informative).

I honestly commend you for writing a Bash library to solve problems people can have with bash ("Fight fire with fire"). I looked for other libraries, by curiosity, and unsurprisingly there are a lot of them. See this list: http://elinux.org/Scriptin.

For example, http://marcomaggi.github.io/docs/mbfl.html has a function to split a pathname into different components, which definition is here: https://github.com/marcomaggi/mbfl/blob/master/src/modules/f....

I am not criticizing, just stating that it looks painful to write and that I don't wont to endure this.

Many CLI commands are working differently with and without arguments, e.g. "cat foo" (read from file) and "cat" (read from stdin), so "nullglob" is dangerous option. Just check your data instead. It is bad idea to run sed on /dev/sda, anyway.

My "bash-modules" project is wrapper around libraries. It solves common problem: how to import library (where it is located, /lib, /usr/share, /usr/lib, /etc, /usr/local, /usr/local/share, /usr/local/share, /opt/..., /home/..., etc.). Instead of path hell, you can just type ". import.sh ..." in CLI, script, or library. It also should solve problem with versions in common way (via symlinks from exact version to general name, e.g. args-1.0.2.sh -> args.sh), but I have no free time to add that.

How about a nice game of golf?

    find app/views/foo -depth -type f | while read f; do mkdir -p $(dirname ${f##foo/}); sed <"$f" >"${f##foo/}" -e s'/\([a-z]*\)foo_\([a-z]*\)path/\1\2path/g'; echo rm "$f"; done
It is better to learn perl instead of sed/awk. It is much more powerful and cleaner than both of them:

  perl -pi -e 's{(.*?)foo_(.*?)path}{$1$2path}' app/views/*/*
I can't recommend this enough. I was afraid of the shell even having used it quite a lot, preferring to do my changes (like these) with an IDE, and later Perl.