Mathematical tools for natural sciences

A typical data in this table consists of two or more varibles, each in turn divided into two or more categories. The minimum size of the contingency table is \(\small{2 \times 2}\), ie., two variables (eg. Male, Female) whose in turn divided into two categories (eg., vaccinated, not vaccinated), thus having \(\small{2 \times 2 = 4}\) combinations (cells) of data. There is no upper limit to the dimensions above this.

Contingency table is used to derive and interpret some of the important probability estimates on the data. We will provide two examples here : a classical \(\small{1 \times 2 }\) table used for disease studies and clinical trials followed by a table of larger dimensions for problem solving in probability theory.

Suppose a new disgnostic kit has been developed for detecting certain lung infection is sent for clinical trial. The kit shows either a positive or negative result for the disease when tried on a person. A random sample of 500 adults with the disease (confirmed by a more powerful scanning method) and another 500 healthy adults without the disease were tested. The results are as follows:

Of the 500 persons with disease present, 470 tested positive and 30 tested negative.

Of the 500 healthy persons with disease absent, 20 tested positive and 480 tested negative.

We prepare the following contingency table to represent the above data, with a general labelling of the 4 cells with a,b,c and d as shown:

Present | Absent | Sum | |
---|---|---|---|

Positive | a=470 | b=20 | a+b=490 |

Negative | c=30 | d=480 | c+d=510 |

Sum | a+c=500 | b+d=500 | n=a+b+c+d=1000 |

We define the following important parameters for the test:

1. The

\(\small{ Sensitivity = P(Positive|Present) = \dfrac{a}{a+c} = 0.94 }~~~~~~~~\) (also known as

2. The

\(\small{ Specificity = P(Negative|Absent) = \dfrac{d}{b+d} = 0.96 }~~~~~~~~\) (also known as

3. The

\(\small{Number~of~False~Positives = \dfrac{b}{b+d} = \dfrac{20}{500} = \dfrac{480} = 0.040 }\)

4. The

\(\small{Number~of~False~negatives = \dfrac{c}{a+c} = \dfrac{30}{00} = 0.060} \)

5. The

\(\small{ Predictive~value~positive = P(Present|Positive) = \dfrac{a}{a+b} = 0.959 }\)

6. The

\(\small{ Predictive~value~negative = P(Absent|Negative) = \dfrac{d}{c+d} = 0.941 }\)

7. The

\(\small{Likelihood~ratio~positive~~=~~\dfrac{P(Positive|Present)}{P(Positive|Absent)}~~=~~\dfrac{sensitivity}{1-specificity} }\)

7. The

\(\small{Likelihood~ratio~negative~~=~~\dfrac{P(Negative|Present)}{P(Negative|Absent)}~~=~~\dfrac{1 - sensitivity}{specificity} }\)

The sensitivity and specificity represent the ability of the test to detect the presence or absence of the disease correctly.

The positive and negative predictive values measure the extent to which we can trust the results of the test.

Higher the value of these four probabilities, better is the diagnostics.

False positive is a measure of wrong identification of the disease by the test when it is not there. Similarly, false negative is the wrong rejection of the disease when it is actually present. These two are undesirable quantities and any diagnostic procedure must minimize them. We will lern more about the false positives and false negatives when we learn statistical tests in the chapters ahead.

The quantities predictive value positives and predictive value negatives can be obtained from measured false positives and false negatives using Bayes theorem. For predictive value positive we write, using Bayes theorem,

\(\small{P(Present|Positive)~=~\dfrac{P(Positive|Present)~P(Present)}{P(Positive|Present)~P(Present) + P(Positive|Absent)~P(Absent)} }\)

\(\small{P(Absent|Negative)~=~\dfrac{P(Negative|Absent)~P(Absent)}{P(Negative|Absent)~P(Absent) + P(Negative|Present)~P(Present)} }\)

In the above expression of Bayes theorem, apart from flse positives and falsoe negatives,we also need the probabilities of disease or condition present and probability of absent to get the prdictive values positive and negative on the right hand side. In real life situations, it is not easy to get exact P(Present) and P(absent).

(This problem is from the book "Biostatistics" by Wayne Daniel and Chad.L.Cross, Chapter 3)

The following table shows 1000 nursing school applications classifie according to scores made on a college entrance examination and the quality of the high school from which they graduated, a rated by a group of techers.

____________________________________________________________________________________________

$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~$ Quality of High Shools $~~~~~~~~~~~~~~~$ $~~~~~~~~~~~~~~~~~~~~~~~~$____________________________________________________________________

Score$~~~~~~~~~~~~~~~~$Poor(P)$~~~~~~~~~~$Average(A)$~~~~~~~~~~~$Superior(S)$~~~~~~~~~Total$ ____________________________________________________________________________________________ Low(L)$~~~~~~~~~~~~~~~~$105$~~~~~~~~~~~~~~~~~~~$60$~~~~~~~~~~~~~~~~~~~~~~~$55$~~~~~~~~~~~~~~~~~$220

Medium(M)$~~~~~~~~~~~~$70$~~~~~~~~~~~~~~~~~~~$175$~~~~~~~~~~~~~~~~~~~~$145$~~~~~~~~~~~~~~~~~$390

High(H)$~~~~~~~~~~~~~~~~$25$~~~~~~~~~~~~~~~~~~~$65$~~~~~~~~~~~~~~~~~~~~~~$300$~~~~~~~~~~~~~~~~~$390

__________________________________________________________________________________________ Total$~~~~~~~~~~~~~~~~~~~~~$200$~~~~~~~~~~~~~~~~~$300$~~~~~~~~~~~~~~~~~~~~~$500$~~~~~~~~~~~~~~~$1000 ________________________________________________________________________________________________

Calculate the probability that an applicant picked randomly from this group

1. Made a low score on the examination

2. Graduated from a superior high school

3. Made a low score on the examination and graduated from a superior high school

4. Made a low score on the examination given that he or she graduated from a superior high school.

5. Made a high score or graduated from a superior high school.

From the table, we can answer these questions:

1. Probability of low score = \(\small{ P(L) = \dfrac{220}{1000} = 0.22 }\)

2. Probability of grduation from Superior high school = \(\small{ P(S)= \dfrac{500}{1000} = 0.5 }\)

3. Probability of low score on exam and from superior high school = \(\small{P(L \cap S) = \dfrac{P(L|S)}{P(S)} = \dfrac{55}{500} \times \dfrac{500}{1000} = 0.055}\)

4. Probability of low score given that from superior school = \(\small{P(L|S) = \dfrac{P(L \cap S)}{P(S)} = \dfrac{0.055}{0.5} = 0.11 }\)

5. Probability of a high score or graduated from a superior school = \(\small{P(H \cup S) = P(H) + P(S) - P(H \cap S) = \dfrac{390}{1000} + \dfrac{500}{1000} - \dfrac{300}{1000} = 0.59 }\)