In other words, "z sigma" means: That a result like this occurs as a statistical fluke, is just as likely as a standard-normal distributed variable giving a value above z.
I would add: If the null hypothesis is true, then "the result like this... (in this case the null hypothesis is of cause that the standard model is true)
If the null hypothesis were true, and the experiment were repeated infinite number of times with a different sample each time then "the result like this or more extreme ...
I agree with adding the "more extreme" part, but I'm not so sure about the infinite number of times part. Certainly, the p-value is (roughly speaking) the probability of seeing a result at least as extreme as the observed result, under the null hypothesis. But one doesn't really need to introduce hypothetical infinite sequences of replications to make sense of that definition.
Isn't the bit about repeating the study over and over again the whole basis of frequentist statistics, though? (Indeed isn't that why it's called frequentism?)
Quoting: "The plot shows the first 50 simulations. In the first simulation I picked some distribution {F_1}. Let {\theta_1} be the median of {F_1}. I generated {n=100} observations from {F_1} and then constructed the interval. The confidence interval is the first vertical line. The true value is the dot. For the second simulation, I chose a different distribution {F_2}. Then I generated the data and constructed the interval. I did this many times, each time using a different distribution with a different true median. The blue interval shows the one time that the confidence interval did not trap the median. I did this 10,000 times (only 50 are shown). The interval covered the true value 94.33 % of the time. I wanted to show this plot because, when some texts show confidence interval simulations like this they use the same distribution for each trial. This is unnecessary and it gives the false impression that you need to repeat the same experiment in order to discuss coverage."