| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by joosters 3709 days ago

You miss my point. Of course, for every example I give, it's possible to build a workaround to handle the spaces. My point is, it's the very fact that you need a workaround that makes it so irritating.

'command1 | command2' just works in most circumstances, so it's frustrating that it falls apart when a filename with a space appears.

And it technically is a shell issue, insomuch as the shell is dividing up the ARGV for each program. The shell is perhaps not to blame, because it can't tell the difference between a filename that has a space in it, and ordinary output that just so happens to correspond with a filename. In other words, it's hard to see what a shell could do to make things better. But the problem still exists.

1 comments

txutxu 3709 days ago

wc is not a builtin.

In that example we're not in front of a bash word splitting issue.

I agree with you that shell scripting has caveats one need to learn. As does Perl, C, PHP, Ruby, Node, Go, Java and what not.

I don't feel a big change is needed to handle spaces in shell scripts, my scripts handle them and I enjoy writing them. Maybe you know of minor tweaks for bash,zsh or any common shell which could be useful in general purpose of files with spaces in the name? don't hesitate to open them a bug, maybe we even get a fix.

But don't send them this example, and insist on it, because the conversation is over:

    $ touch a_file
    $ ls | wc
          1       1       7
    $ rm a_file
    $ touch "a file"
    $ ls | wc
          1       2       7

Equivalent input, with/without spaces and expected output.

The 2 is a word count, and we did pass two words, I don't expect a 1 there, _that_ could be a bug.

link

joosters 3709 days ago

Once again, you're getting too hung up on my dumb little example, which I spent exactly 0 seconds thinking about. It's the general problem that's interesting (and annoying), the 'command1 | command2' general case.

If you want a difficult example, then take a more real-world example: e.g. the workflow of a 'find [some stuff] |grep [some other stuff]' is one to consider. That's where horrid workarounds like -print0 and -z have to come in, but the simple 'find|grep' works fine up until a file has a space in it.

As I said, there's no simple fix, even for the re-organised form of 'grep [some other stuff] `find [some stuff]` because the shell can't tell the difference between a filename and just a stream of text in the output of one program.

link

kps 3709 days ago

The (ex) AT&T Research command tw(1) has pretty much replaced find(1) for me (particularly with some canned search selectors for particular projects).

link

joosters 3709 days ago

I'm struggling to find any information on this command (it's not an easy name to search for!) Do you have any links you could share please?

link

kps 3709 days ago

Apologies, I forgot to include a link: The toolkit is now at https://github.com/att/ast since AT&T laid off the group a couple years ago.

About half the package consists of evolutions of traditional Unix commands. The parts I use regularly are ksh and tw. tw ('tree walk') is sort of a 'find --exec' replacement with a C-like selector syntax. It's a bit verbose, so for interactive use I generally set up project-specific shell aliases with selection expressions, e.g.

  alias cctw=$'tw -e "select: return (type == REG) && ((name == \'*.c\') || (name == \'*.h\') || (name == \'*.cpp\') || (name == \'*.cc\') || (name == \'*.h\') || (name == \'*.hpp\') || (name == \'*.mm\') || (name == \'*.inc\'));" '

and then use those, e.g.

  cctw egrep -w MyIdentifier

link

joosters 3709 days ago

Thanks, this looks interesting!

link

txutxu 3709 days ago

If I could need to combine a find|grep right now (this is, if the directory recursion and filters of grep by itself, weren't enough, which maybe a corner case too...) I could do it like this:

    while IFS= read -rd '' file; do
      echo "do whatever with: $file"
      grep whatever -- "$file"
    done < <(find ~/whatevers -print0)

It's like natural language if you do it daily.

Will handle not only spaces, but also

new

lines

file

names.

Have a nice day.

link

joosters 3709 days ago

link

txutxu 3708 days ago

For me, the code I did give, is not a workaround.

It's the canonical way of do it.

Other ways, even if they are "expected to work by inexperienced occasional users"... are simply flawed a first eye view.

A workaround is to ditch shell script, as soon as you face a problem, and blame shell script, and turn to do it in a "more advanced language" that has the same or more caveats. That could be a workaround.

Delimiting file names with null bytes, in case they could be split by any of the $IFS values, is NOT a workaround, is pure logic.

link