Absolutely. Global variables are always going to need some sort of arbitration mechanism, and this is doing it in the innermost loop. Even if there's only one thread, something needs to find that out. I'm surprised it's only half the speed.
This benchmark shows abysmal numbers, no need to run a bloated test suite that is so large and convoluted that no one understands what is being measured.