Hacker News new | ask | show | jobs
by AbrahamParangi 849 days ago
I'm confused in that I don't see how this is troubling. Yes, the two experimenters rolled dice and got the same result, but it's as if one of them was rolling a 6 sided die and the other a 20 sided one. Each experiment is not a result per se but a sample from a distribution.

How you infer the shape of that distribution based on the experiment is a function of the distribution of all courses your experiment could have taken. This set of paths is different in each case, which means the inference we make must also be different.

There is no inconsistency. The confusion seems to be in assuming that the experimental result was a true statement about the nature of the world rather than a true statement about simply what happened.

edit: This seems to me to be a specific case of a general class of difficult thinking where you ask yourself: "what are all the worlds that I might be in that are consistent with what I'm presently observing".

4 comments

If you see two people roll a d20 and get a 20, you get to say "wow, that was unlikely" to both of them, even if one of them privately admits they were going to quickly re-roll their die if they got below a 10. What matters is their actual behavior (identical in the example) not their intentions. The d6 vs d20 version is different because their behavior is different.
Let's imagine that we ran it as a simulation and we ran it a million times. The two people would have a different distribution of results. If you ignore the intention, you ignore reality as if that intention were not a part of it.

Do you not notice that your inference is less accurate using this line of reasoning? Does that not suggest that it's simply wrong?

What do you mean by 'results'?

They would not have different distributions of results on their first die roll.

They would have different distributions of results on their reported die roll.

If I am looking at their first die roll, the fact that they would have different reported die rolls doesn't matter!

Here’s another example:

Say you have a lazy researcher. They flip a coin, and if it comes up heads, they do the experiment. If it comes up tails, they just write down a random number.

If you _only get access_ to the final number, then you should discount what they wrote down – it’s 50% likely to be fake.

If you do 1,000,000 simulations of this, it’s useless 50% of the time.

But if you know the result of the coin flip, it doesn’t matter whether they would have generated a nonsense number in a different timeline, or that they’re not reliably accurate. _You know_ they’re reliably accurate in _this case_, so you can trust their data.

This is well put. Coincidentally in the example the results are the same , but they need not be. given repeated experiments with the same intentions one may expect different distributions.

However, one could just move the argument up a level and manufacture a case of different intentions leading to the same distributions and then ask the same question.

Imagine you have a machine that rolls a d20 and lies if the die comes up 1-19, and tells the truth on a 20. Should you trust this machine usually? No. But if you can _see that the die comes up 20_ then you should trust it. The fact that it sometimes might lie doesn't mean that you should distrust the machine if you can see that in this case it's telling the truth.
> Coincidentally in the example the results are the same , but they need not be.

The questions is whether we should draw different conclusions when the results are the same. I don’t think that anyone has any issues with drawing different conclusions when the results are different!

Unlikely in what probability space? We only see one version of reality so the probabilities that we assign to any outcome are based on a prior choice of probability space. That is why the researchers' intent matters.
Both events have the same probability of happening; 1/20. The fact that the researcher intended to do something in a reality that didn't happen isn't relevabnt.
If you want to know whether a drug is more effective than placebo, the answer to that question depends on both the data collected in a study and the initial study design. There’s a reason why it’s meaningless to say “that was unlikely” after somebody says they were born on January 1, or after getting a two-factor code that is the same number six times. There’s nothing special about those particular events except for the fact that we noticed them. Since we live in a single instance of the universe where they have already happened, they have probability 1. At the same time, on any given instance they have probability 1/365ish or 1/10000. The difference between these two interpretations of the probability is the same difference as having a good experimental design vs a flawed experimental design where you repeat the experiment until you get the results you want to see.
> a flawed experimental design where you repeat the experiment until you get the results you want to see.

But the Bayesian point is that, if you use Bayesian statistics, this doesn't work. Except by outright lying about their experimental protocol or the data that was actually collected (for example, only reporting the successful trial at the end and not all the failed ones the preceded it), an experimenter cannot "fool" you into accepting a hypothesis not justified by the data. They can point to the one successful trial all they want, and make up stories about how the previous failed trials were somehow different, and the Bayesian simply does not care. The Bayesian just looks at the entire corpus of data and finds that it doesn't support the hypothesis, and that's it.

Yes, indeed.
I had the same reaction?

We don't actually care at all about what happened in the two experiments per se, we care about the information provided by the experiments about future or other events.

If somehow we learned that both experiments were totally unreplicable and a product purely of that time and location with no implications for anything else ever before or since we wouldn't care about them except maybe as a historical curiosity.

Intentionally is a red herring; what matters is our expectation about what might be observed if we were to repeat the experiments again.

In that sense, there's variability in the second experiment's results due to sample size being random. So we interpret and infer based on that potential experiment we could do, not what happened to be observed at a particular moment.

I'm also confused about what this has to do with Bayesian versus non-Bayesian inference as you could approach either experiment from either paradigm, and there are different forms of Bayesianism, including nonsubjective Bayesianism.

> We don't actually care at all about what happened in the two experiments per se, we care about the information provided by the experiments about future or other events.

How can the experiments provide relevant information other than through what happened?

If what happened is exactly the same (first patient with such and such characteristics had this outcome, etc.) what information can be provided by the things that didn’t happen in either?

How could it matter that the things that didn’t happen in one experiment are different from the things that didn’t happen in the other when we are interested in the information provided by what did happen?

We don't actually care at all about the distribution of things that could have happened per se, we care about the information provided by the experiments about future or other events.

> Each experiment is not a result per se but a sample from a distribution.

But what distribution? What is this "distribution" that we are taking a sample from?

The frequentist says: because the two experimenters have different intentions, the experiments they ran are samples from different distributions.

But the Bayesian says: the experimenter's intentions can't affect things like how dice rolls come out or how well a given treatment works on a given patient. The actual "distribution" is the set of all factors that do affect how the dice rolls come out or how well the treatment works on each patient. And those factors are the same for both experimenters; their different intentions don't affect that. So both sets of data are samples from the same distribution, not different ones.

> How you infer the shape of that distribution based on the experiment is a function of the distribution of all courses your experiment could have taken.

If you're going to state it this way, then the Bayesian response is: "all courses your experiment could have taken" has nothing to do with the experimenter's intentions. The experimenters can't magically make the physical world and the biology of humans work differently depending on what stopping criterion they choose. And the physical world and the biology of humans is what determines "the courses your experiment could have taken".

In other words, when the frequentist makes up "distributions" based on the experimenter's stopping criterion, they are, whether they admit it (or even realize it) or not, making a claim about how the physical world and the biology of humans works that is obviously false.

This seems to assume that intentions "don't count" in some way, as if they were nonphysical, whereas unless you presume a supernatural soul, they are as physical as any other part of the experiment.
> This seems to assume that intentions "don't count" in some way, as if they were nonphysical

Not nonphysical: just not part of the physical degrees of freedom that can affect things like how die rolls come up or how well a given treatment works on a patient.

The experimenter's intentions (not about the stopping criterion, but about other things) can of course be upstream physical causes, so to speak, of things like what the actual process of the treatment is, and that can, of course affect how well the treatment works. But in the scenario under discussion, all those things are stipulated to be the same in both experiments. And once that is specified, whatever physical variation corresponds to the variation in the experimenters' intentions cannot affect the results.

> Not nonphysical: just not part of the physical degrees of freedom that can affect things like how die rolls come up or how well a given treatment works on a patient.

For a dice that is not a concern (unless animism is taken into consideration), but when humans are on both side of the equation, how do you get rid of all the social and psychological effects that imply, including placebo and the desire to see the study bend in some direction, be it at some unconscious level?

> when humans are on both side of the equation, how do you get rid of all the social and psychological effects that imply, including placebo and the desire to see the study bend in some direction, be it at some unconscious level?

You don't. But again, in the scenario under discussion, these are stipulated to be the same in both cases. (Or more precisely, the underlying unconscious factors involved have the same distribution in both cases.) So again, these kinds of "intentions" don't make the distributions different in the two cases.

Another comment is relevant here. The whole point of things like double blind studies in medicine is to make it the case that, whatever unconscious factors are involved along the lines you describe, they don't change the underlying distribution from which the sampled results are drawn. In the scenario as described in the article, it was assumed that all of these precautions were taken. That is part of the reason for the article's statement that the experimenters' intentions about the stopping criterion can't affect the results.

Of course if you know that in a particular case, those precautions were not taken, that changes how you view the results. But Bayesian analysis can cover this case too: you just expand your hypothesis space and your prior to include things like "the experimenters unconsciously influenced the results in different ways because of their different stopping criterion". The article excludes this possibility in the scenario it describes, but in the real world, yes, we know it is possible to have study designs that don't eliminate this failure mode, and our analysis should allow for that in cases where the study design was such that it might have happened.

> But again, in the scenario under discussion, these are stipulated to be the same in both cases.

That's not what I read. The two studies imply that they are conducted under two very different mindsets, which will most likely also mean people will receive a different human treatment. At this point, the statistics you will get out of it is almost ornamental. Sometime the most significant information to extract from a description is not the one that is the most obvious that is pointed at as the thing you can quantify and draw comparisons.

The question is whether we should draw different conclusions from one set of observations depending not just on what we are observing but also on different ways to define "what are all the worlds that I might be in that are consistent with what I'm presently observing".