> Meaning that the effect must be fairly strong to be observable in two studies with n=50?
Do what now? Isn't the problem that it could have randomly happened (especially if people did a bunch of other similar studies that didn't observe an effect, and only these two were published)?
For a publishable effect at smaller n, the effect size needs to be fairly large. If you have a huge number of people in a trial, you can get statistical significance of negligible consequence.
The problem with a single study of n=50 isn't the 50, it's that it's a single study.
That's not quite right. If the study is underpowered at n = 50 --- which is extremely likely --- statistically significant estimates are likely to be inflated. And as power declines, they also become more likely to have the wrong sign (e.g., the study will yield a positive estimate even though the true effect is negative).
I would agree in general, but I would like to see three or more, as well as variations to test the boundaries of this.
Things can go wrong in one or two studies, so having independent replication is needed to really cement things.