That manual counting of cores is what I thought we would let `parallel` handle! I have never actually used parallel, though, so I don't know how to best do it.
Right. I was unclear. What I really meant was that I was thinking that `parallel` could automatically spin up more and more jobs until it sensed that there is no further performance to be gained. I'm not sure to what extent that is true, though.
A shell script wouldn't be enough (I've tried). However there are plenty of CLI based load testing tools, including one I've written myself. And if you need something more advanced then there is always Gatling, which is run via the command line and produces proper HTML reports and graphs plus is extended in code (eg in Scala) rather than GUI controls