Hacker News new | ask | show | jobs
by txutxu 3716 days ago
ls is not bash wc is not bash

And that is a discouraged way to count files in shell script.

Still, simply adding -l to ls, could handle spaces correctly (the files count)

I insist, spaces are not your enemy, there are much more weird file names for a shell. Shell can handle spaces if used properly.

3 comments

You miss my point. Of course, for every example I give, it's possible to build a workaround to handle the spaces. My point is, it's the very fact that you need a workaround that makes it so irritating.

'command1 | command2' just works in most circumstances, so it's frustrating that it falls apart when a filename with a space appears.

And it technically is a shell issue, insomuch as the shell is dividing up the ARGV for each program. The shell is perhaps not to blame, because it can't tell the difference between a filename that has a space in it, and ordinary output that just so happens to correspond with a filename. In other words, it's hard to see what a shell could do to make things better. But the problem still exists.

wc is not a builtin.

In that example we're not in front of a bash word splitting issue.

I agree with you that shell scripting has caveats one need to learn. As does Perl, C, PHP, Ruby, Node, Go, Java and what not.

I don't feel a big change is needed to handle spaces in shell scripts, my scripts handle them and I enjoy writing them. Maybe you know of minor tweaks for bash,zsh or any common shell which could be useful in general purpose of files with spaces in the name? don't hesitate to open them a bug, maybe we even get a fix.

But don't send them this example, and insist on it, because the conversation is over:

    $ touch a_file
    $ ls | wc
          1       1       7
    $ rm a_file
    $ touch "a file"
    $ ls | wc
          1       2       7
Equivalent input, with/without spaces and expected output.

The 2 is a word count, and we did pass two words, I don't expect a 1 there, _that_ could be a bug.

Once again, you're getting too hung up on my dumb little example, which I spent exactly 0 seconds thinking about. It's the general problem that's interesting (and annoying), the 'command1 | command2' general case.

If you want a difficult example, then take a more real-world example: e.g. the workflow of a 'find [some stuff] |grep [some other stuff]' is one to consider. That's where horrid workarounds like -print0 and -z have to come in, but the simple 'find|grep' works fine up until a file has a space in it.

As I said, there's no simple fix, even for the re-organised form of 'grep [some other stuff] `find [some stuff]` because the shell can't tell the difference between a filename and just a stream of text in the output of one program.

The (ex) AT&T Research command tw(1) has pretty much replaced find(1) for me (particularly with some canned search selectors for particular projects).
I'm struggling to find any information on this command (it's not an easy name to search for!) Do you have any links you could share please?
Apologies, I forgot to include a link: The toolkit is now at https://github.com/att/ast since AT&T laid off the group a couple years ago.

About half the package consists of evolutions of traditional Unix commands. The parts I use regularly are ksh and tw. tw ('tree walk') is sort of a 'find --exec' replacement with a C-like selector syntax. It's a bit verbose, so for interactive use I generally set up project-specific shell aliases with selection expressions, e.g.

  alias cctw=$'tw -e "select: return (type == REG) && ((name == \'*.c\') || (name == \'*.h\') || (name == \'*.cpp\') || (name == \'*.cc\') || (name == \'*.h\') || (name == \'*.hpp\') || (name == \'*.mm\') || (name == \'*.inc\'));" '
and then use those, e.g.

  cctw egrep -w MyIdentifier
If I could need to combine a find|grep right now (this is, if the directory recursion and filters of grep by itself, weren't enough, which maybe a corner case too...) I could do it like this:

    while IFS= read -rd '' file; do
      echo "do whatever with: $file"
      grep whatever -- "$file"
    done < <(find ~/whatevers -print0)
It's like natural language if you do it daily.

Will handle not only spaces, but also

new

lines

on

file

names.

Have a nice day.

You miss my point. Of course, for every example I give, it's possible to build a workaround to handle the spaces. My point is, it's the very fact that you need a workaround that makes it so irritating.
For me, the code I did give, is not a workaround.

It's the canonical way of do it.

Other ways, even if they are "expected to work by inexperienced occasional users"... are simply flawed a first eye view.

A workaround is to ditch shell script, as soon as you face a problem, and blame shell script, and turn to do it in a "more advanced language" that has the same or more caveats. That could be a workaround.

Delimiting file names with null bytes, in case they could be split by any of the $IFS values, is NOT a workaround, is pure logic.

That there are worse things that exist is no reason not to pluck the low hanging fruit. I see this attitude on HN a lot. Spaces are probably the 80 in the 80/20 rule here. Why not address them?
Please give a single example in shell about spaces, that has not been already addressed.

From where do your get that rule?

It's the pareto principle. In this case he means "20% of the problems account for 80% of the trouble" -- albeit somwhat strangely worded.
Thanks I did know about the theory. My question was more about where are the numbers coming from?

I write shell daily, and that are not my numbers regarding "spaces on filenames issues", sorry.

Not sure on why this comment got down-voted.

If it's because I did not explain how to do that, there is howto do it:

    http://mywiki.wooledge.org/BashFAQ/004
If it's because the comment on the given example...

    $ touch file1
    $ ls -l | wc -l
    1
    $ touch "file 2"
    $ ls -l | wc -l
    2
... (?)

If we were talking about "new lines in file names", or "dashes at the beginning of file names", or code injection through file names, then we could be talking of more complex solutions.

But the space issues in shell are simple, and have known solutions. If you're a daily user or you're not at learning stage, spaces don't turn to be a issue.

I'm guessing it was down-voted because you keep missing the point. Yes, there are work-arounds for this scenario, but the overhead in remembering the work-arounds is the problem. He's not saying you can't do it, he's saying the problem is that you have to change your setup for edge-cases, which are actually fairly common.
Indeed, edge cases keep hitting you, until you know them and howto work-around them. Is that a shell specific issue? that is a universal issue I think.

A big problem, is that people that underestimate an unknown technology, does not take the time to learn properly that technology.

Many programmers think they "know" shell (and many beginners), so they don't invert more time and tests, and then they keep facing corner cases, facing known and documented and solved issues, etc...

You are asserting, than learning bash is harder than learning the corner cases of other more advanced languages. Is that what you say? do you think that "bash" has more corner cases than ... (?)

How we can patch that? With a web shell?

Totally true, I've miss totally the point.

It's not even so much the overhead in remembering the work-arounds, as remembering when exactly you need the work-arounds.
Shell script split on spaces (or the values of $IFS) by design.

For example, use null delimited values, because a variable (like IFS) cannot contain null bytes. Is this a workaround? I see it as pure logic.

Even understanding howto handle them, helps to understand the internal design of the shell and common external utilities.

Remembering is hard? not a "shell" specific issue... I use it daily and maybe that's why I don't face the same problems that other see so clear.

haha, yes, that's what I meant. Your phrasing is better
But, I need to remember to do that.

How about (for example):

    $ for i in $(ls | grep 'cheese\|fish' | head -n 5) ...
I'm sure there is a way to make this work correctly (take 5 filenames containing either cheese or fish), but I'd need to put more thought into making it work correctly.

  ls | grep 'cheese\|fish' | head -n 5 | while read -r i …
With Zsh:

  > touch {dark\ red,green,light\ blue}{cheese,fish\ fingers,fruit}
  > ls -1 *(cheese|fish)*([1,5])
  dark redcheese
  dark redfish fingers
  greencheese
  greenfish fingers
  light bluecheese
  > for i in *(cheese|fish)*([1,5]) ...
Since that's shell globbing, it can cope with any spaces, newlines etc.
Wow, that is quite impressive, I might have to look into this more. The advantage of using head is I only have to learn it once, to get the first n of lines or files, but it might be worth learning. Thanks.
I know 25% of the special Zsh syntax, and know of a further 25% of what exists.

I read the whole manpage some years ago, and noted down what seemed useful. I use "man zshexpn" when I forget the syntax for things like this.