Biostatistics with R

Conditional Probability

There are occasions when we have to compute the probabilities of events under certain conditions . This is called conditional probability.

A big basket has three types of vegetables and four types of fruits. We randomly pick an object from the basket, and find that it is a fruit. What is the probability that it can be an orange?

This problem can be expressed in set theory notation. Consider the basket as a sample space S which has many subsets. For our question, let subset B represent fruits and subset A represent oranges among the fruits. The total number of all the fruits and vegetables is the elements of sample space n(S).

Here we try to define the probability of event A(selecting an orange), considering only the elements of B(ie., fruits in the basket). If we take the elements of B (fruits), what is the probability of observing event A in it (orange)?

We denote this \(\small{P(A|B)}\) = conditional probability of A given that B has occured

There are three possibilities with A and B. See the diagram below:

Possibility 1 : A is a subset of B. In this case, P(A|B) = 1.
If all the fruits are oranges, probability that a selected fruit is orange will be 1.

Possibility 2 : A and B are disjoint, without common elements . P(A|B) = 0.
If the fruits do not contain oranges, probability that a fruit picked up is orange will be 0.

Possibility 3 : The sample spaces of A and B intersect (ie., they have common elements, indicated by the shaded region in the figure.). This intersection is the region where we can say that A occurs whenever B has occured . Let oranges constitute a fraction of fruits in the basket. If we pick up a fruit randomly, then the probability that it is an ornge is decided by the fraction of oranges among the fruits.Thus,the number of elements common to A and B divided by number of elements of B gives P(A|B):

\(\small{ P(A|B) = \dfrac{n(A \cap B)}{n(B)} = \dfrac{p(A \cap B)}{p(B)} }\)

In general, the conditional probability is defined as follows:

Given two events A and B in the same event space with P (B) > 0, the conditional probability of A given B is defined as

\(\small{ P(A|B) = \dfrac{p(A \cap B)}{p(B)} = \dfrac{Joint~probability~of~A~and~B}{Probability~of~B} }\)

Example 1:
In a town affected by a season of epidemics, two diseases D1 and D2 are prevalent. It was estimated that 3.2% of the population contracted the disease D1 and 1.6% of the population has contrated both the diseases. Estimate the probability that a person who is affected by the disease D1 to get the disease D2.

Using the expression for the conditional probability, we have

\( \small{P(D2|D1) = \dfrac{P(D1 \cup D2)}{P(D1)} = \dfrac{0.016}{0.032} = 0.5} \)

Therefore, there is 50% chance that the person who contracted disease D1 will also get the disease D2.

The multiplication rule

From the expressions \(\small{ P(A|B) = \dfrac{P(A \cap B)}{P(B)} }\) and \(\small{ P(B|A) = \dfrac{P(A \cap B)}{P(A)}}\), we can write the expressions for the probability that the events A and B both occur as

\(\small{ P(A \cap B) = P(B|A) P(A) = P(A|B) P(B) }\)

The above relationship is called the multiplication rule .

Example : 40% of the families living in a city posses a two wheeler. 20% of those whos posses two wheeler also own a car. What is the probability that a family owns a car and a two wheeler?

From the data given,

\( \small{P(two~wheeler) = 0.4,~~~~P(car|two~wheeler) = 0.2 }\)

Applying multiplication rule, we have

\(\small{ P(car~and~two~wheeler) = P(car|two~wheeler) P(two~wheeler) = 0.2 \times 0.4 = 0.08 }\)

Therefore, 8% of the families own a car and a two wheeler.

Independent events

Two events are independent if the outcome of the first event does not affect the outcome of the other.

For two independent events A and B,

\(\small{ P(A \cap B) = P(A) \times P(B)} \)

Thus, if two coins are tossed simulataneously, the probability of first one landing with head and the second one landing with a tail is given by,

\( \small{P(H~and~T) = P(H) \times P(T) = \dfrac{1}{2} \times \dfrac{1}{2} = \dfrac{1}{4} }\)

The above concepts can be extended to independence of any number of events:

\( \small{ P(A \cap B \cap C .....) = P(A) \times P(B) \times P(C) \times ..... }\)

Example : Suppose event A consists of tossing a coin, event B consists of rolling a dice and event C consists of drawing a King randomly from a deck of cards. If these three events are performed simultaneously, what is the probability of getting a head in coin toss, getting a 5 in dice throw and drawing a King randomly from the pack of cards?
We have, \(\small{ ~~~~~~~P(Head) = \dfrac{1}{2},~~~~P(5) = \dfrac{1}{6},~~~and~~~P(king) = \dfrac{4}{52}} \)

Since these three events are independent of each other,

\(\small{ P(Head~and~5~and~King) = P(H) \times P(5) \times P(king) = \dfrac{1}{2} \times \dfrac{1}{6} \times \dfrac{4}{52} = \dfrac{4}{624} } \)

Dependent events

Two events are dependent if the occurance of one affects the probability of occurance of the other. In that case we use the conditional probability to compute the probability of occurance of both simultaneously:

\( \small{P(A \cap B) = P(B|A) P(A) = P(A|B) P(B) }\)

Often we compute the probability of randomly selecting certain objects in succession out of many available objects. While doing so, two situations arise. We can draw objects successively from a collection either with replacement or without replacement.

Successive draws with replacemet makes them independent evets . For example, consider a bag with 5 red marbles and 3 black marbles. Suppose we draw a marble randomly from the bag, note its color and replace it in the bag. The probability of drawing a red marble is \(\small{\frac{5}{8} } \). Now if we draw a marble second time from this bag after replacement, the probabilities of getting a red marble in the second draw is again \(\small{\frac{5}{8} } \), exactly same as the first draw. So, the probability of getting red marble in two successive draws is,

\(\small{P(red~in~first~draw) \times P(black~in~second~draw) = \dfrac{5}{8} \times \dfrac{5}{8} }\)

Suppose, after drawing red marble first time, we do not replace it in the bag. Then, during the second draw, one red marble will be less compared to the first time. This changes the probability of drawing a red marble second time. Now since we have 5 − 1 = 4 red marbles left in the bag, the probaility of selecting a red marble second time is \(\small{\dfrac{4}{7}}\) .Therefore,

\(\small{ P(red~in~first~draw) = \dfrac{5}{8} }\)
\( \small{P(red~in~second~draw | red~in~first~draw) = \dfrac{4}{7}} \)

Therefore, \(\small{ P(red \cap red) = P(red~in~second~draw | red~in~first~draw) \times P(red~in~first~draw) = \dfrac{4}{7} \times \dfrac{5}{8} }\)

We should remember the basic connection:
successive drawing with replacement ---> independent events
successive drawing without replacement --- > dependent events

Mutually exclusive events

Two or more events are mutually exclusive if the occurance of one of them completely exlcudes the possibility of occurance of all the others. For example, the occurance of Head and Tail in a coin toss are mutually exclusive. The six outcomes 1,2,3,4,5 and 6 of a dice throw are mutually exclusive events.

If two events are mutually exclusive, P(A and B) is not possible. The probability P(A or B) for any one of them to occur will be equal to the sum of their individual probabilities of occurance.

We can extend this in general to any number of mutually exclusive events A,B,C,... to write,

\(\small{ P(A \cup B \cup C ....) = P(A) + P(B) + P(C) + .... }\)
Example :
In a dice throw, compute the probability that the outcome is an odd number.

The outcome can be odd in one of the three mutually exclusive ways : 1, 3, and 5. Therefore,

\(\small{ P(odd~number) = P(1) + P(3) + P(5) = \dfrac{1}{6} + \dfrac{1}{6} + \dfrac{1}{6} = \dfrac{3}{6} }\)