# Emanuel Winterfors

## February 8, 2008

### On the two different forms of Bayes’ rule

Filed under: Probability theory — Tags: — Emanuel Winterfors @ 2:35 am

Bayes’ rule for probability densities is a somewhat treacherous area, since it is derived differently than the Bayes’ rule for probabilities even though they both have the same form. Let’s start off with the latter.

Bayes’ rule for probabilities
Given the (nonzero) probability $P(B)$ for the event $B$ and the probability $P(A \cap B)$ for the event $A \cap B$ (that is, the events $A$ and $B$ occurring simultaneously), one can define the conditional probability of $A$ given $B$:

 $P(A|B) \equiv \frac{{P(A \cap B)}}{{P(B)}}$ (1)

This implies by symmetry that $P(A|B)P(B)$ $= P(A \cap B)$ $= P(B|A)P(A)$, which gives Bayes’ rule for probabilities

 $P(A|B) = \frac{{P(B|A)P(A)}}{{P(B)}}$ (2)

Bayes’ rule for probability densities
A probability density is (or PDF for probability density function) is a function $p(\theta )$ defined over some set $\Theta$ so that

 $P(A) = \int\limits_{\theta \in A} {p(\theta )d\Theta }$ (3)

for any measurable subset (event) $A \subseteq \Theta$ , where $d\Theta$ is some measure on set $\Theta$ (typically, but not necessarily, the Lebesgue measure of the coordinates used to span set $\Theta$ ).

A conditional probability density $p(\theta |y)$ is a probability density that depends on an additional variable $y \in \Omega$ , where $\Omega$ is generally not the same as $\Theta$ (as is the case for conditional probability defined above). It is also a probability density in the same sense as $p(\theta )$ .

 $P(A) = \int\limits_{\theta \in A} {p(\theta |y)d\Theta } \qquad \forall y \in \Omega$ (4)

If there is a probability density $p(y)$ associated with elements $y \in \Omega$ with respect to a measure $d\Omega$ on $\Omega$ , one can define a joint probability density

 $p(\theta ,y) \equiv p(\theta |y)p(y)$ (5)

which is a probability density over the joint set $\Xi = \Theta \times \Omega$ with respect to the measure $d\Xi = d\Theta d\Omega$ , since

 $P(A) = \int\limits_{(\theta ,y) \in A} {p(\theta ,y)d\Xi } \forall A \subseteq \Xi$ (6)

(easily verified by insertion).
By symmetry, we have again that $p(\theta |y)p(y) = p(\theta ,y) = p(y|\theta )p(\theta )$ , which gives the Bayes’ rule for probability densities

 $p(\theta |y) = \frac{{p(y|\theta )p(\theta )}}{{p(y)}}$ (7)

Bayes’ rule for probability densities is thus derived from the definition of joint probability density (5), and not from the definition on conditional probability (1). It is also more limited than the probability formulation, applying only to the case of joint probability spaces for which two sets of coordinates can be separated out.

 © 2008 Emanuel Winterfors $\LaTeX$ code can be used in comments: $latex p(\theta)$ gives $p(\theta)$