Hacker News new | ask | show | jobs
by ZenoArrow 3716 days ago
Those can all be seen as tweaks to existing shells. With regards to...

"* Have some simple, easy to follow rules which let me work on files that have spaces in their names, without having to remember the various commands with various special cases (like -print0)."

I'm not sure I understand what's hard about files with spaces in their names. I'd say it's easy enough to work with such filepaths by using tab autocomplete when using the shell interactively, and quote marks when writing shell scripts. Can you give an example of where these approaches wouldn't work?

1 comments

When you pipe the output of one command into another, e.g 'ls | wc' (obviously a dumb example), the second command will split the filenames on spaces and so will not run properly.

The workarounds for this all involve nasty extra parameters for different commands (e.g. the -print0 example)

ls is not bash wc is not bash

And that is a discouraged way to count files in shell script.

Still, simply adding -l to ls, could handle spaces correctly (the files count)

I insist, spaces are not your enemy, there are much more weird file names for a shell. Shell can handle spaces if used properly.

You miss my point. Of course, for every example I give, it's possible to build a workaround to handle the spaces. My point is, it's the very fact that you need a workaround that makes it so irritating.

'command1 | command2' just works in most circumstances, so it's frustrating that it falls apart when a filename with a space appears.

And it technically is a shell issue, insomuch as the shell is dividing up the ARGV for each program. The shell is perhaps not to blame, because it can't tell the difference between a filename that has a space in it, and ordinary output that just so happens to correspond with a filename. In other words, it's hard to see what a shell could do to make things better. But the problem still exists.

wc is not a builtin.

In that example we're not in front of a bash word splitting issue.

I agree with you that shell scripting has caveats one need to learn. As does Perl, C, PHP, Ruby, Node, Go, Java and what not.

I don't feel a big change is needed to handle spaces in shell scripts, my scripts handle them and I enjoy writing them. Maybe you know of minor tweaks for bash,zsh or any common shell which could be useful in general purpose of files with spaces in the name? don't hesitate to open them a bug, maybe we even get a fix.

But don't send them this example, and insist on it, because the conversation is over:

    $ touch a_file
    $ ls | wc
          1       1       7
    $ rm a_file
    $ touch "a file"
    $ ls | wc
          1       2       7
Equivalent input, with/without spaces and expected output.

The 2 is a word count, and we did pass two words, I don't expect a 1 there, _that_ could be a bug.

Once again, you're getting too hung up on my dumb little example, which I spent exactly 0 seconds thinking about. It's the general problem that's interesting (and annoying), the 'command1 | command2' general case.

If you want a difficult example, then take a more real-world example: e.g. the workflow of a 'find [some stuff] |grep [some other stuff]' is one to consider. That's where horrid workarounds like -print0 and -z have to come in, but the simple 'find|grep' works fine up until a file has a space in it.

As I said, there's no simple fix, even for the re-organised form of 'grep [some other stuff] `find [some stuff]` because the shell can't tell the difference between a filename and just a stream of text in the output of one program.

The (ex) AT&T Research command tw(1) has pretty much replaced find(1) for me (particularly with some canned search selectors for particular projects).
If I could need to combine a find|grep right now (this is, if the directory recursion and filters of grep by itself, weren't enough, which maybe a corner case too...) I could do it like this:

    while IFS= read -rd '' file; do
      echo "do whatever with: $file"
      grep whatever -- "$file"
    done < <(find ~/whatevers -print0)
It's like natural language if you do it daily.

Will handle not only spaces, but also

new

lines

on

file

names.

Have a nice day.

That there are worse things that exist is no reason not to pluck the low hanging fruit. I see this attitude on HN a lot. Spaces are probably the 80 in the 80/20 rule here. Why not address them?
Please give a single example in shell about spaces, that has not been already addressed.

From where do your get that rule?

It's the pareto principle. In this case he means "20% of the problems account for 80% of the trouble" -- albeit somwhat strangely worded.
Thanks I did know about the theory. My question was more about where are the numbers coming from?

I write shell daily, and that are not my numbers regarding "spaces on filenames issues", sorry.

Not sure on why this comment got down-voted.

If it's because I did not explain how to do that, there is howto do it:

    http://mywiki.wooledge.org/BashFAQ/004
If it's because the comment on the given example...

    $ touch file1
    $ ls -l | wc -l
    1
    $ touch "file 2"
    $ ls -l | wc -l
    2
... (?)

If we were talking about "new lines in file names", or "dashes at the beginning of file names", or code injection through file names, then we could be talking of more complex solutions.

But the space issues in shell are simple, and have known solutions. If you're a daily user or you're not at learning stage, spaces don't turn to be a issue.

I'm guessing it was down-voted because you keep missing the point. Yes, there are work-arounds for this scenario, but the overhead in remembering the work-arounds is the problem. He's not saying you can't do it, he's saying the problem is that you have to change your setup for edge-cases, which are actually fairly common.
Indeed, edge cases keep hitting you, until you know them and howto work-around them. Is that a shell specific issue? that is a universal issue I think.

A big problem, is that people that underestimate an unknown technology, does not take the time to learn properly that technology.

Many programmers think they "know" shell (and many beginners), so they don't invert more time and tests, and then they keep facing corner cases, facing known and documented and solved issues, etc...

You are asserting, than learning bash is harder than learning the corner cases of other more advanced languages. Is that what you say? do you think that "bash" has more corner cases than ... (?)

How we can patch that? With a web shell?

Totally true, I've miss totally the point.

It's not even so much the overhead in remembering the work-arounds, as remembering when exactly you need the work-arounds.
Shell script split on spaces (or the values of $IFS) by design.

For example, use null delimited values, because a variable (like IFS) cannot contain null bytes. Is this a workaround? I see it as pure logic.

Even understanding howto handle them, helps to understand the internal design of the shell and common external utilities.

Remembering is hard? not a "shell" specific issue... I use it daily and maybe that's why I don't face the same problems that other see so clear.

haha, yes, that's what I meant. Your phrasing is better
But, I need to remember to do that.

How about (for example):

    $ for i in $(ls | grep 'cheese\|fish' | head -n 5) ...
I'm sure there is a way to make this work correctly (take 5 filenames containing either cheese or fish), but I'd need to put more thought into making it work correctly.

  ls | grep 'cheese\|fish' | head -n 5 | while read -r i …
With Zsh:

  > touch {dark\ red,green,light\ blue}{cheese,fish\ fingers,fruit}
  > ls -1 *(cheese|fish)*([1,5])
  dark redcheese
  dark redfish fingers
  greencheese
  greenfish fingers
  light bluecheese
  > for i in *(cheese|fish)*([1,5]) ...
Since that's shell globbing, it can cope with any spaces, newlines etc.
wc, in its basic form, is made for counting words. A filename in your case is not a word. I don't see the problem here. Use wc in line mode and the "problem" is solved.

It also has nothing to do with the shell, nor does the general case of pipes you present below. Your shell redirects the output from ls to the input of wc. If you don't like the simple approach that tools consume and produce arbitrary text in a manner it sees fit for its purpose,, maybe it's your operating system that you have a beef with.