| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by txutxu 3716 days ago

ls is not bash wc is not bash

And that is a discouraged way to count files in shell script.

Still, simply adding -l to ls, could handle spaces correctly (the files count)

I insist, spaces are not your enemy, there are much more weird file names for a shell. Shell can handle spaces if used properly.

3 comments

joosters 3716 days ago

You miss my point. Of course, for every example I give, it's possible to build a workaround to handle the spaces. My point is, it's the very fact that you need a workaround that makes it so irritating.

'command1 | command2' just works in most circumstances, so it's frustrating that it falls apart when a filename with a space appears.

And it technically is a shell issue, insomuch as the shell is dividing up the ARGV for each program. The shell is perhaps not to blame, because it can't tell the difference between a filename that has a space in it, and ordinary output that just so happens to correspond with a filename. In other words, it's hard to see what a shell could do to make things better. But the problem still exists.

txutxu 3716 days ago

wc is not a builtin.

In that example we're not in front of a bash word splitting issue.

I agree with you that shell scripting has caveats one need to learn. As does Perl, C, PHP, Ruby, Node, Go, Java and what not.

I don't feel a big change is needed to handle spaces in shell scripts, my scripts handle them and I enjoy writing them. Maybe you know of minor tweaks for bash,zsh or any common shell which could be useful in general purpose of files with spaces in the name? don't hesitate to open them a bug, maybe we even get a fix.

But don't send them this example, and insist on it, because the conversation is over:

    $ touch a_file
    $ ls | wc
          1       1       7
    $ rm a_file
    $ touch "a file"
    $ ls | wc
          1       2       7

Equivalent input, with/without spaces and expected output.

The 2 is a word count, and we did pass two words, I don't expect a 1 there, _that_ could be a bug.

joosters 3716 days ago

Once again, you're getting too hung up on my dumb little example, which I spent exactly 0 seconds thinking about. It's the general problem that's interesting (and annoying), the 'command1 | command2' general case.

If you want a difficult example, then take a more real-world example: e.g. the workflow of a 'find [some stuff] |grep [some other stuff]' is one to consider. That's where horrid workarounds like -print0 and -z have to come in, but the simple 'find|grep' works fine up until a file has a space in it.

As I said, there's no simple fix, even for the re-organised form of 'grep [some other stuff] `find [some stuff]` because the shell can't tell the difference between a filename and just a stream of text in the output of one program.

kps 3716 days ago

The (ex) AT&T Research command tw(1) has pretty much replaced find(1) for me (particularly with some canned search selectors for particular projects).

joosters 3716 days ago

I'm struggling to find any information on this command (it's not an easy name to search for!) Do you have any links you could share please?

kps 3716 days ago

Apologies, I forgot to include a link: The toolkit is now at https://github.com/att/ast since AT&T laid off the group a couple years ago.

About half the package consists of evolutions of traditional Unix commands. The parts I use regularly are ksh and tw. tw ('tree walk') is sort of a 'find --exec' replacement with a C-like selector syntax. It's a bit verbose, so for interactive use I generally set up project-specific shell aliases with selection expressions, e.g.

  alias cctw=$'tw -e "select: return (type == REG) && ((name == \'*.c\') || (name == \'*.h\') || (name == \'*.cpp\') || (name == \'*.cc\') || (name == \'*.h\') || (name == \'*.hpp\') || (name == \'*.mm\') || (name == \'*.inc\'));" '

and then use those, e.g.

  cctw egrep -w MyIdentifier

txutxu 3716 days ago

If I could need to combine a find|grep right now (this is, if the directory recursion and filters of grep by itself, weren't enough, which maybe a corner case too...) I could do it like this:

    while IFS= read -rd '' file; do
      echo "do whatever with: $file"
      grep whatever -- "$file"
    done < <(find ~/whatevers -print0)

It's like natural language if you do it daily.

Will handle not only spaces, but also

new

lines

on

file

names.

Have a nice day.

joosters 3716 days ago

You miss my point. Of course, for every example I give, it's possible to build a workaround to handle the spaces. My point is, it's the very fact that you need a workaround that makes it so irritating.

txutxu 3715 days ago

For me, the code I did give, is not a workaround.

It's the canonical way of do it.

Other ways, even if they are "expected to work by inexperienced occasional users"... are simply flawed a first eye view.

A workaround is to ditch shell script, as soon as you face a problem, and blame shell script, and turn to do it in a "more advanced language" that has the same or more caveats. That could be a workaround.

Delimiting file names with null bytes, in case they could be split by any of the $IFS values, is NOT a workaround, is pure logic.

dfcowell 3716 days ago

That there are worse things that exist is no reason not to pluck the low hanging fruit. I see this attitude on HN a lot. Spaces are probably the 80 in the 80/20 rule here. Why not address them?

txutxu 3716 days ago

Please give a single example in shell about spaces, that has not been already addressed.

From where do your get that rule?

tomsmeding 3716 days ago

It's the pareto principle. In this case he means "20% of the problems account for 80% of the trouble" -- albeit somwhat strangely worded.

txutxu 3716 days ago

Thanks I did know about the theory. My question was more about where are the numbers coming from?

I write shell daily, and that are not my numbers regarding "spaces on filenames issues", sorry.

txutxu 3716 days ago

Not sure on why this comment got down-voted.

If it's because I did not explain how to do that, there is howto do it:

    http://mywiki.wooledge.org/BashFAQ/004

If it's because the comment on the given example...

    $ touch file1
    $ ls -l | wc -l
    1
    $ touch "file 2"
    $ ls -l | wc -l
    2

... (?)

If we were talking about "new lines in file names", or "dashes at the beginning of file names", or code injection through file names, then we could be talking of more complex solutions.

But the space issues in shell are simple, and have known solutions. If you're a daily user or you're not at learning stage, spaces don't turn to be a issue.

CaptSpify 3716 days ago

I'm guessing it was down-voted because you keep missing the point. Yes, there are work-arounds for this scenario, but the overhead in remembering the work-arounds is the problem. He's not saying you can't do it, he's saying the problem is that you have to change your setup for edge-cases, which are actually fairly common.

txutxu 3716 days ago

Indeed, edge cases keep hitting you, until you know them and howto work-around them. Is that a shell specific issue? that is a universal issue I think.

A big problem, is that people that underestimate an unknown technology, does not take the time to learn properly that technology.

Many programmers think they "know" shell (and many beginners), so they don't invert more time and tests, and then they keep facing corner cases, facing known and documented and solved issues, etc...

You are asserting, than learning bash is harder than learning the corner cases of other more advanced languages. Is that what you say? do you think that "bash" has more corner cases than ... (?)

How we can patch that? With a web shell?

Totally true, I've miss totally the point.

NoGravitas 3716 days ago

It's not even so much the overhead in remembering the work-arounds, as remembering when exactly you need the work-arounds.

txutxu 3715 days ago

Shell script split on spaces (or the values of $IFS) by design.

For example, use null delimited values, because a variable (like IFS) cannot contain null bytes. Is this a workaround? I see it as pure logic.

Even understanding howto handle them, helps to understand the internal design of the shell and common external utilities.

Remembering is hard? not a "shell" specific issue... I use it daily and maybe that's why I don't face the same problems that other see so clear.

CaptSpify 3716 days ago

haha, yes, that's what I meant. Your phrasing is better

CJefferson 3716 days ago

But, I need to remember to do that.

How about (for example):

    $ for i in $(ls | grep 'cheese\|fish' | head -n 5) ...

I'm sure there is a way to make this work correctly (take 5 filenames containing either cheese or fish), but I'd need to put more thought into making it work correctly.

kps 3716 days ago

  ls | grep 'cheese\|fish' | head -n 5 | while read -r i …

Symbiote 3716 days ago

With Zsh:

  > touch {dark\ red,green,light\ blue}{cheese,fish\ fingers,fruit}
  > ls -1 *(cheese|fish)*([1,5])
  dark redcheese
  dark redfish fingers
  greencheese
  greenfish fingers
  light bluecheese
  > for i in *(cheese|fish)*([1,5]) ...

Since that's shell globbing, it can cope with any spaces, newlines etc.

CJefferson 3716 days ago

Wow, that is quite impressive, I might have to look into this more. The advantage of using head is I only have to learn it once, to get the first n of lines or files, but it might be worth learning. Thanks.

Symbiote 3715 days ago

I know 25% of the special Zsh syntax, and know of a further 25% of what exists.

I read the whole manpage some years ago, and noted down what seemed useful. I use "man zshexpn" when I forget the syntax for things like this.