| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by laurenth 695 days ago
	Author here, That's correct! Unlike Bash and other modern shells, the POSIX standard doesn't include arrays or any other data structures. The way we found around this limitation is to use arithmetic expansion and indexed shell variables (that are starting with `_` as you noted) to get random memory access.

2 comments

osmsucks 695 days ago

Since I experimented with something similar in the past to mimick multidimensional arrays: depending on the implementation this can absolutely _kill_ performance. IIRC, Dash does a linear lookup of variable names, so when you create tons of variables each lookup starts taking longer and longer.

link

n_plus_1_acc 695 days ago

I hope you're not compiling C to sh for performance reasons.

link

osmsucks 695 days ago

It's not about performance, it's about viability. If the result is so slow that it's unusable, it doesn't matter how portable it ends up being.

link

laurenth 695 days ago

We haven't found this to be an issue for Pnut. One of the metric we use for performance is how much time it takes to bootstrap Pnut, and dash takes around a minute which is about the time taken by bash. This is with Pnut allocating around 150KB of memory when compiling itself, showing that Dash can still be useful even when hundreds of KBs are allocated.

One thing we did notice is that subshells can be a bottleneck when the environment is large, and so we avoided subshells as much as possible in the runtime library. Did you observe the same in your testing?

link

osmsucks 694 days ago

> We haven't found this to be an issue for Pnut. One of the metric we use for performance is how much time it takes to bootstrap Pnut, and dash takes around a minute which is about the time taken by bash. This is with Pnut allocating around 150KB of memory when compiling itself, showing that Dash can still be useful even when hundreds of KBs are allocated.

Interesting. When you say "even when hundreds of KBs are allocated", do you mean this is allocating variables with large values, or tons of small variables? My case was the latter, and with that I saw a noticeable slowdown on Dash.

Simplest repro case:

  $ cat many_vars_bench.sh
  #!/bin/sh
  
  _side=500
  
  i=0
  while [ "${i}" -lt "${_side}" ]; do
    j=0
    while [ "${j}" -lt "${_side}" ]; do
      eval "matrix_${i}_${j}=$((i+j))" || exit 1
      : $(( j+=1 ))
    done
    i=$((i+1))
  done
  
  $ time bash many_vars_bench.sh
  5.60user 0.12system 0:05.78elapsed 99%CPU (0avgtext+0avgdata 57636maxresident)k
  0inputs+0outputs (0major+13020minor)pagefaults 0swaps
  
  $ time dash many_vars_bench.sh
  40.75user 0.14system 0:41.22elapsed 99%CPU (0avgtext+0avgdata 19972maxresident)k
  0inputs+0outputs (0major+4951minor)pagefaults 0swaps

Dash was ~8 times slower. Increase the side of the square "matrix" for a proportionally bigger slowdown (this one uses 250003 variables).

> One thing we did notice is that subshells can be a bottleneck when the environment is large, and so we avoided subshells as much as possible in the runtime library. Did you observe the same in your testing?

Yes, launching a new process is generally expensive and so is spawning a subshell. If the shell is something like Bash (with a lot of startup/environment setup cost) then you'll feel this more than something like Dash, where the whole point was to make the shell small and snappy for init scripts: https://wiki.ubuntu.com/DashAsBinSh#Why_was_this_change_made...

In my limited testing, Bash generally came out on top for single-process performance, while Dash came out on top for scripts with more use of subshells.

link

thesnide 695 days ago

I used almost the same idea, but with files in my https://github.com/steveschnepp/shlibs

link