| > understanding the nuances between words
like probability and likelihood Go into a lab, do an experiment, call that
one trial, measure a number, call that
number (the value at this trial of)
random variable X. We might want the average value, expected
value, or expectation of X denoted by
E[X]. Under meager assumptions, if we take a
sequence of independent samples of X, then
their average will converge to E[X]; this
is the law of large numbers. We might be interested in the event,
call it A, when X > 1. We might want the probability of A, that
is, P(A) = P(X > 1). For random variable X, we can define its
cumulative distribution: For real
number x, F_X(x) = P(X <= x) [Here are using TeX notation where F_X is
F with a subscript X.] Then with calculus and meager assumptions,
the probability density of X is f_X(x) = d/dx F_X(x) With meager assumptions, calculus and
f_X(x) can give us the expectation E[X]. The likelihood of X = x is just f_X(x),
that is, the value of the density at x. For the Gaussian distribution, the maximum
likelihood is at the central peak of the
density which is also the expectation. In some approaches to statistical
estimation, we have some data and seek
estimate x that maximizes the likelihood
of getting the data we actually did get. Given events A and B,
we can define the conditional probability
of event A given event B by P(A|B) = P(A and B) / P(B) So, if we think of events as geometric
regions and their probabilities as their
areas (actually part of a serious
approach), then P(A|B) is the fraction of
B that is also A. Then P(A|B) = P(A and B) / P(B) is Bayes Rule. If we do experiments and believe from
whatever prior to the experiment that we
have a meaningful estimate of P(B) or
P(A|B), then maybe we are being Bayesian. More generally knowing that event B
occurred we can regard that as
information we have obtained, and what
that information says about event A is
just P(A|B). Then, if events A and B are independent,
event B gives us no more information about
event A and we have P(A|B) = P(A) So, if we are interested in event A and
its probability P(A) and suddenly are told
that event B occurred, then for event A we
now want the updated view P(A|B). Using the measure theory foundations of
probability and the Radon-Nikodym theorem
of measure theory, under meager
assumptions we can define for random
variables X and Y E[Y|X] which is a function, say, f(X), of random
variable X and the best non-linear least
squares estimate of Y for any function of
X. This measure theory approach also lets us
define E[Y|Z] for an infinite set Z of random variables.
This definition is useful, e.g., in the
Poisson process where each increment of
time to the next arrival is independent of
all previous increments, Markov processes,
a stochastic process adapted to a
history, etc. |