Emanuel Winterfors

February 8, 2008

On the two different forms of Bayes’ rule

Filed under: Probability theory — Tags: — Emanuel Winterfors @ 2:35 am

Bayes’ rule for probability densities is a somewhat treacherous area, since it is derived differently than the Bayes’ rule for probabilities even though they both have the same form. Let’s start off with the latter.

Bayes’ rule for probabilities
Given the (nonzero) probability P(B) for the event B and the probability P(A \cap B) for the event A \cap B (that is, the events A and B occurring simultaneously), one can define the conditional probability of A given B:

P(A|B) \equiv \frac{{P(A \cap B)}}{{P(B)}}

(1)

This implies by symmetry that P(A|B)P(B) = P(A \cap B) = P(B|A)P(A), which gives Bayes’ rule for probabilities

P(A|B) = \frac{{P(B|A)P(A)}}{{P(B)}}

(2)

Bayes’ rule for probability densities
A probability density is (or PDF for probability density function) is a function p(\theta ) defined over some set \Theta so that

P(A) = \int\limits_{\theta  \in A} {p(\theta )d\Theta }

(3)

for any measurable subset (event) A \subseteq \Theta , where d\Theta is some measure on set \Theta (typically, but not necessarily, the Lebesgue measure of the coordinates used to span set \Theta ).

A conditional probability density p(\theta |y) is a probability density that depends on an additional variable y \in \Omega , where \Omega is generally not the same as \Theta (as is the case for conditional probability defined above). It is also a probability density in the same sense as p(\theta ) .

P(A) = \int\limits_{\theta  \in A} {p(\theta |y)d\Theta } \qquad \forall y \in \Omega

(4)

If there is a probability density p(y) associated with elements y \in \Omega with respect to a measure d\Omega on \Omega , one can define a joint probability density

p(\theta ,y) \equiv p(\theta |y)p(y)

(5)

which is a probability density over the joint set \Xi  = \Theta  \times \Omega with respect to the measure d\Xi  = d\Theta d\Omega , since

P(A) = \int\limits_{(\theta ,y) \in A} {p(\theta ,y)d\Xi } \forall A \subseteq \Xi

(6)

(easily verified by insertion).
By symmetry, we have again that p(\theta |y)p(y) = p(\theta ,y) = p(y|\theta )p(\theta ) , which gives the Bayes’ rule for probability densities

p(\theta |y) = \frac{{p(y|\theta )p(\theta )}}{{p(y)}}

(7)

Bayes’ rule for probability densities is thus derived from the definition of joint probability density (5), and not from the definition on conditional probability (1). It is also more limited than the probability formulation, applying only to the case of joint probability spaces for which two sets of coordinates can be separated out.

© 2008 Emanuel Winterfors
\LaTeX code can be used in comments: $latex p(\theta)$ gives p(\theta)
Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: