| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jmcnulty 1787 days ago

The long rebuttal from bgoldst fails to answer the question of how to solve this "in bash" when he introduces additional commands like tr(1) and sed(1). You should avoid using additional programs to perform actions where bash builtins can do the job. The extra overhead and impact on runtime of context switching to load in a new program is non-trivial if you have to loop over it thousands of times. It's better to normalize the string data for use with 'read' using builtin string substitution.

  $ string="Los Angeles, London, Belfast, New York"
  $ IFS="," read -r -a array <<< "${string/, /,}"
  
  $ echo ${array[0]}
  Los Angeles
  
  $ echo ${array[1]}
  London

..etc.

Don't have the free time today to read the rest of it unfortunately.

2 comments

pizza234 1787 days ago

> The extra overhead and impact on runtime of context switching to load in a new program is non-trivial if you have to loop over it thousands of times

The specification is: "speed does not matter".

The long answer addresses this solution:

  $ string="Los Angeles, London, Belfast, New York"
  $ IFS="," read -r -a array <<< "${string/, /,}"
  
  $ echo ${array[0]}
  Los Angeles
  
  $ echo ${array[1]}
  London

as "not very generic" in point #3, which is correct. Bash simply doesn't support generic splitting by itself (things go downhill quickly once, for example, newlines are introduced, and so on), and if precision/flexibility are priority over speed, then it's better to use standard linux tools.

link

jmcnulty 1787 days ago

If you have newlines present then process the data a line at a time, as you would if reading from a file. This is nowhere near as difficult or cumbersome as you're making out.

link

pizza234 1787 days ago

One certainly can, but the increase in complexity shows that Bash starts not to be the most effective tool, when performing tasks it isn't designed for (and compare to a full-blown programming language, there are many).

link

chubot 1787 days ago

The pure bash solution has overhead too. If you need to split 1000 strings it will create, write, and read 1000 temp files. Depending on your hardware and file system, that's more expensive than creating 1000 or 2000 processes.

I would worry about making it correct before making it fast, the former being a big challenge!

Shells Use Temp Files to Implement Here Documents : http://www.oilshell.org/blog/2016/10/18.html

(Oil doesn't do this; it creates a process for here docs without touching disk. In theory this could be eliminated for here docs less than PIPE_BUF, which is probably a lot of them)

link

jmcnulty 1787 days ago

It's the read/readarray builtin that's creating the tmpfiles and that's not great, but the string substitution doesn't. My point was there's no need to call out to another program to do something that bash is capable of doing itself.

link

chubot 1787 days ago

No, it's the here doc, including here strings. See the blog post, which doesn't use read or readarray.

link

jmcnulty 1787 days ago

Understand. Did some quick tests myself and see what you mean.

Here's a version that doesn't use "here string" and so doesn't create temporary files.

  #!/bin/bash

  shopt -s lastpipe

  string="Los Angeles, London, Belfast, New York"
  echo "${string/, /,}" | readarray -d, -t arrayA
  echo ${arrayA[0]}
  echo ${arrayA[1]}

Also, the lastpipe option runs the readarray in the context of the current process.

link