How would you measure that? How would you notice a failed attempt at finding a hole, of stack smashing, etc, if it did not even result in a crash, or any other dysfunction?
I suppose if someone found a technique in the wild that defeats regular ASLR (even if only sometimes), they could then test that same technique against fine-grained ASLR and evaluate if the FG ASLR was more effective at preventing exploitation.
Any classic buffer overflow/stackssmash can defeat ASLR, it just might take a long time to get lucky guessing addresses. Couldn’t we Monte Carlo this?
Maybe take a known vulnerable exec, create a fuzzing attacker and run it both ways seeing how long it takes to get lucky a few times. The more secure version should take longer.
There are whitehats constantly looking at this sort of thing. It's why we can say that KASLR is a huge waste of time - because it's both theoretically bogus and we have actual exploit devs saying that they don't care about it.
My guess is you look for a security POC that reads something like "We were unable to build the ROP chain on openbsd due to the gadget locations being unknowable."