Specifically which part of "record the variables" do you not understand?
The amount of variables simply has no impact on the difficulty of running a benchmark. The amount of variables will impact the relevance or use value of the benchmark, but interpretation and application is the user's problem. The amount of variables has zero impact on the process of performing or recording the benchmark, and this matters because interpretation and application is easier with more benchmarks covering more cases, but your insistence that it's "hard" could only reduce that number, making you the only variable that is actually a problem.
When running on wine, we could say that how you are running a game can have as much on performance impact as your specs.
Identifying the differences between setups is hard.