Like a ML model I would prefer being scored with cross entropy and not right/wrong. Like, I might guess wrong but it might not be that far off in likelihood.
It is mitigating that we get so many questions, but I agree it's inefficient. As a human forecaster I also prefer being judged in part on my confidence in each of the alternatives.