|
From _Probability Theory: The Logic of Science_: > Then the possibility seems open that, for different priors, different functions r(x1,..., xn) of the data may take on the role of sufficient statistics. This means that use of a particular prior may make certain particular aspects of the data irrelevant. Then a different prior may make different aspects of the data irrelevant. One who is not prepared for this may think that a contradiction or paradox has been found. I think this explains one of the confusions many commenters have; for an experimenter who repeats observations until they reach their desired ratio r/(n-r), the ratio r/(n-r) is not a sufficient statistic! But when we have an experimenter who has a pre-registered n, then ratio r/(n-r) is a sufficient statistic. However, in either case, > We did not include n in the conditioning statements in p(D|θ I) because, in the problem as defined, it is from the data D that we learn both n and r. But nothing prevents us from considering a different problem in which we decide in advance how many trials we shall make; then it is proper to add n to the prior information and write the sampling probability as p(D|nθ I). Or, we might decide in advance to continue the Bernoulli trials until we have achieved a certain number r of successes, or a certain log-odds u = log[r/(n − r)]; then it would be proper to write the sampling probability as p(D|rθ I) or p(D|uθ I), and so on.
Does this matter for our conclusions about θ? > In deductive logic (Boolean algebra) it is a triviality that AA = A; if you say: ‘A is true’ twice, this is logically no different from saying it once. This property is retained in probability theory as logic, since it was one of our basic desiderata that, in the context of a given problem, propositions with the same truth value are always assigned the same probability. In practice this means that there is no need to ensure that the different pieces of information given to the robot are independent; our formalism has automatically the property that redundant information is not counted twice. |
Bayes' Theorem holds because it can be proven. Therefore, situations can be constructed where considering identical data without considering priors gives nonsense conclusions. For example if we happen to know as a prior that P(outcome of experiment is a certain ratio) = P(experiment is completed) then that must be considered when interpreting the results.