If the goal is to benchmark Ruby execution time, it's unfair. The slow startup time is a valid concern when considering short-lived sessions, but it's kind of meaningless when trying to benchmark a Ruby implementation.
There are no 'fair' benchmarks, all benchmarks should be biased to the problem you're actually solving running your actual workload. If you can't replay your workload at multiples of real volume then you should probably work on doing that before benchmarking as it helps you out with the real problem of verifying your infrastructure.
In general a benchmark is probably the worst metric you could ever use for deciding on an implementation. Unless the profit margin of your business is razor thin and dependant eeking out every last drop of performance, and even then most of those gains will be from extremely small sections of code that are probably best written in assembler by a programming God, and you should investigate FPGAs, ASICs, and other high performance solutions.
If your benchmark (infrastructure) involves a database (or anything that uses disks) that's probably going to be the problem long before the speed of your language / language implementation.
In all comparisons, you should remove confounding variables. Yes, you should benchmark something you actually care about, otherwise what's the point? That doesn't mean all other variables are immediately null and void. That's why I said said if your goal is to measure ruby execution time, you should remove startup time.
As for the practice of benchmarking in general, you're partially right. Micro-benchmarks are usually useless because they don't map to real work load. But profiling and speeding up small portions that are used heavily can have drastic improvements that in isolation seem small -- the death by a thousand cuts problem. Not all improvements come from isolated instances with very slow performance profiles.
This fallacy about DB access and not needing to optimize really needs to go away though. Even if 50% of your app is spent hitting DB, you have opportunity to speed up the other 50% and it's likely far easier. Ruby in particular is ripe for improvements on the CPU side. I managed to reduce my entire test suite time by 30% by speeding up psych. I managed to cut the number of servers I need in EC2 in half by switching from MRI, Pasenger, and resque to JRuby, TorqueBox, and Sidekiq. And I've managed to speed up my page rendering time anywhere from 8 - 40x by switching from haml to slim. None of these changes required modifications to my DB, none required me to write assembly, none required me to switch to custom-built hardware, and each helped reduce the expenses for my bootstrapped startup, while improving the overall experience for my customers.
Agreed. But then you're not benchmarking Ruby execution speed, which really makes comparing the two not very worthwhile. By the same notion, then when you compare on something like JRuby, you really should be comparing JDK 6, 7, and 8 builds, along with various startup flags, both JVM and JRuby. E.g., short runs may benefit greatly from turning off JRuby JIT, JAR verification, and using tiered compilation modes. May as well add Nailgun and drip in there as well, since both can speed up startup time on subsequent runs. Which one of these is going to be the JRuby you use in your comparisons? Or are you really going to show 20 different configurations?
You can run into the same problem with MRI and its GC settings. If too low for your test, you're going to hit GC hard. It's best to normalize that out so you have an even comparison. Confounding variables and all that.
There's a lot you can do outside the Ruby container to influence startup time. When comparing two implementations, the defaults are certainly something to consider, but not when trying to see which actually executes Ruby faster. They are two different metrics of performance and should be compared in isolation.
In production server apps the JVM always running anyway, and under some degree of Hotspot optimization, so for a JRuby benchmark to be informative and worth anything to you, you'll want to account for that.