Related to that, I know that the SINE foundation[0] has been investigating using zero knowledge proofs for benchmarking.
I also looks like they recently released a tool concretely to allow for privacy preserving benchmarking[1]. (I haven't looked into the the contents of the repo itself to check whether they are actually using zero knowledge proofs).
How would you prove that a submission is not truthful? You can check if it's an outlier but that's definitely not foolproof.