Biostatistics with R

Counting possible outcomes

In the previous section, we defined the probability of event to be the ratio of the number of favourable outcomes to the total number of possible outcomes of the experiment. In this section, we will learn the methods of counting the possible outcomes of an experiment under various conditions.

The multiplication principle

Suppose we have an experiment \(\small{ E_1 }\) with \(\small{n_1}\) possible outcomes, and another experiment \( \small{E_2}\) with \( \small{n_2}\) possible outcomes.


We define a composite experiment \(\small{E_1 E_2} \) as "performing \(\small{ E_1} \) first followed by \( \small{E_2}\) and noting down the results". Thus, if \(\small{ E_1 }\) and \(\small{ E_2 }\) are single coin tosses, then toss coin 1 followed by coin 2 and note down the composite result, like \(\small{ \{HT\}}\)or \(\small{\{TT\} }\).


The miltiplication principle states that if the experiment \(\small{ E_1 }\) has \(\small{n_1}\) possible outcomes and experiment \(\small{ E_2 }\) has \( \small{n_2}\) possible outcomes, then the composite experiment \( \small{E_1 E_2 }\) has \( \small{n_1 n_2}\) possible outcomes.


Thus the composite experiment of a two coin toss, in which each experiment has two outcomes of H and T, will have \(\small{ 2 \times 2 = 4 }\) possible outcomes namely \( \small{\{HH, HT, TH, TT \} } \)

Permutation of n objects

Given three objects labelled \(\small{ A_1} \), \(\small{ A_2 }\) and \(\small{ A_3 }\), in how many ways we can arrange them next to each other?.
After considering various possible shuffles, we come up with the following six arrangements:

        \(\small{A_1 A_2 A_3 }\), \( \small{A_1 A_3 A_2} \), \( \small{A_2 A_1 A_3 }\), \( \small{A_2 A_3 A_1} \), \( \small{A_3 A_1 A_2} \), \( \small{A_3 A_2 A_1 }\)


In order to calculate the number of ways in which n objects can be arranged, we proceed as follows:

We have n positions for n objects. We define n experiments, each one of which consists of going to a position and choosing an object for it.


For first position, an object can be chosen in n ways.
For the second position, we can select an obeject in (n-1) ways.
For the third position, we have (n-2) choices and so on.

The composite experiment consists of choosing object for all positions.

Therefore, by multiplication principle, we write,


Number of ways of arranging n objects = \(\small{n(n-1)(n-2)(n-3)....4.3.2.1 = n! }\)

Here, the symbol n!(read as "n factorial") represents the product \(\small{ 1.2.3.4.5.....(n-1)(n-1)n} \)


The number of ways of arranging n objects among themselves is called the permutation of n objects . We remember,


\( ~~~Permutation~of~n~objects = n!~~~ \)



Permutation of n objects taken r at a time

Sometimes we have to pick up a subset of r objects out of n objects. How many ways this can be done?


In this case, we have to fill r positions by picking from n objects.


The first position can be filled in n ways.
Once the first position is filled, the second can be filled up in (n-1) ways.
Once the secon is filled, the third position has (n-2) choices.
When this is continued, finally the \(\small{r^{th}}\) position has (n-r+1) choices.


Therefore,

Number of ways of filling r positions by selcting out of n objects is given by,
                  \(\small{_nP_r } \) = \( \small{n(n-1)(n-2)(n-3)....(n-r+1) } \)
Here, the symbol \( \small{_nP_r} \) stands for the permutation of n objects taken r at a time.


We can re-write the above expression as,
\(\small{_nP_r } \) = \( \small{n(n-1)(n-2)(n-3)....(n-r+1) \times \dfrac{(n-r)(n-r-1)(n-r-2)....3.2.1}{(n-r)(n-r-1)(n-r-2)....3.2.1} } \)
realizing that the numerator of the above expression is \( \small{n!}\) and the denominator is \( \small{(n-r)!} \), we get


Permutation of n objects taken r at a time = \(\small{ ~~_nP_r = \dfrac{n!}{(n-r)!}~~ }\)


Example :
The number of ways we can pick 4 cards out of a pack of 52 cards is, \(\small{_nP_r = _{52}P_4 = \dfrac{52!}{(52-4)!} = 52 \times 51 \times 50 \times 49 = 6497400 }\)



One important point to be understood : The permutation \(\small{_nP_r } \) is sample without replacement. ie., when an object is selected, it is removed, and the next object will have one choice less.



We can also sample r objects out of n by replacing the sample after every selection. In this method, every selection will have same number of choices which is n.

Therefore,


Number of ways of selecting r out of n objects with replacement \(\small{ = n \times n \times ..... r~times = n^r} \)


Example :
How many 3 digit numbers can be formed from 10 digits [0-9], when the numbers are allowed to repeat?.


Since we can repeat a number, we have 10 choices for every digit.


By applying multiplication principle, we can get \(\small{10 \times 10 \times 10 = 1000 }\) different 3 digit numbers.








Combination (indistinguishable permutation)

In the case of permutation of n objects taken r at a time, it was understood that each one of the object had an identity. ie., they are distingushable . We know that we can arrange r objects in \( r! \) ways. If they are distinguishable, each one of these \( r! \) arrangments will be different.


Next, we will deal with the case of n objects which are indistinguishable (ie., identical). They can either be really identical or we don not bother to differentiate between them. We can choose r objects out of these n in \( _nP_r \) permutations. For each of these group of r objects each,the \( r! \) arrangements among them will be identical or redundant. They all look same. Therefore, effectively, there are only \(\dfrac{_nP_r}{r!} \) ways in which we can choose r out of n indistinguishable objects .


Thus the number of ways we can select r objects out of n objects in an unordered fashion is denoted by the symbol \(\small{_nC_r}\) which is defined by,


                                \(\small{_nC_r = \dfrac{_nP_r}{r!} = \dfrac{n!}{r!(n-r)!} }\)




Each one of the \(\small{_nC_r}\) unordered subsets is a combination of n objects taken r at a time . Thus the combination \(\small{_nC_r}\) is an indistinguishable permutation.


Example :
Out of 10 identical patients enrolled for a clinical tria, 4 are to be randomly choasen to test a particular drug. In how many ways they can be selected?


Since the four patients, once selected, will be treated with same drug, they will not be distinguished in the experiment. Therefore, the number of ways the 4 patients can be selcted is
                  \(\small{_{10}C_4 = \dfrac{10!}{4! 6!} = 210 }\)



Distinguishable permutations

We take a three letter word MOM and consider all the permutations of the letters in it : MOM, OMM, MOM, MMO, OMM, MMO. Of these 6 permutations, only three are distinguishable (unique): MOM, OMM and MMO.


In general, if N is the total number of objects which are divided into k groups with identical members, with frequencies \(\small{n_1, \ n_2\,..., n_k }\) within each group, then the distinguishable permutations can be determined by dividing the total number of permutations by the product of factorials of frequencies:


\( \small{distinguishable~permutations~=~\dfrac{N!}{n_1! n_2! n_3!.....n_k!} }\)
example :
In a DNA sequence with 12 nucleotides, letter 'A' appears 4 times, 'T' appears 3 times, 'G' appears 4 times and 'C' appears once. How many unique 12 letter sequences can be created by shuffling them?.

Number of unique sequrnces = \(\small{ \dfrac{12!}{4! 3! 4! 1!} = 138600 } \)




R Scripts

R has many library functions for sampling and permuting the given set of objects.


Given a set of n objects(can be numbers, strings or characters), we can find various permutations and combinations of them taken r at a time.

Random sampling from a set of objects

The sample() can be called to sample r elements from a given vector x of n elements. We can actually get the permuted elements. In addition to n,r and x, this function takes one more parameter to decide whether we allow repeat of elements in the permutations. See below:


###### Randoml sampling from a set of objects # Create a vector integers from 1 to 100 x = c(1:100) # we can randomly sample 20 of them with equal probability s = sample(x,20) print(s)

The above code prints the following 20 random samples from given set.

[1] 33 16 73 48 34 17 20 95 67 59 83 39 22 2 49 79 26 74 45 25

Permutation and combination of objects

We can generate the possible permutations of \(\small{n}\) elements of a set taken \(\small{r}\) at a time. We use the functions called permutations() in the library gtools . The boolean parameter repeats.allowed allows one to create a list of combinations which are having unique elements:


##### To get the permutation of n objects library(gtools) ## create a list of n objects x = c('A','B','C','D') ## unique elements of x should be >= r ## get and print permuted list of n objects in x taken r at a time ## This function returns a matrix with permuted elements along rows. permlist = permutations(n=length(x), r=3, v=x, repeats.allowed=FALSE) print(permlist)

Running the above script creates this output on the screen:

[,1] [,2] [,3] [1,] "A" "B" "C" [2,] "A" "B" "D" [3,] "A" "C" "B" [4,] "A" "C" "D" [5,] "A" "D" "B" [6,] "A" "D" "C" [7,] "B" "A" "C" [8,] "B" "A" "D" [9,] "B" "C" "A" [10,] "B" "C" "D" [11,] "B" "D" "A" [12,] "B" "D" "C" [13,] "C" "A" "B" [14,] "C" "A" "D" [15,] "C" "B" "A" [16,] "C" "B" "D" [17,] "C" "D" "A" [18,] "C" "D" "B" [19,] "D" "A" "B" [20,] "D" "A" "C" [21,] "D" "B" "A" [22,] "D" "B" "C" [23,] "D" "C" "A" [24,] "D" "C" "B"

The function combinations () from the sample library generates possible combinations of n objects taken r at a time. This is demonstrated here with the output:


##### To get the permutation and combination of n objects library(gtools) ## create a list of n objects x = c('A','B','C','D') ## unique elements of x should be >= r # To get a combination of above list ## This function returns a matrix with permuted elements along rows. y = combinations(n=length(x), r=3, v=x) print(y)

[,1] [,2] [,3] [1,] "A" "B" "C" [2,] "A" "B" "D" [3,] "A" "C" "D" [4,] "B" "C" "D"


Number of Permutation and combination

In order to get the number of permutations and combinations of n objects taken r at a time, we write R functions to translate the corresponding formulas \(\small{ ~~_nP_r = \dfrac{n!}{(n-r)!}~~ }\) and \(\small{_nC_r = \dfrac{n!}{r!(n-r)!} }\). In the script below, we have written the two functions and called them:


## Function to compute permutation of n items taken r at a time npermute = function(n,r) { nPr = factorial(n)/factorial(n-r) return(nPr) } ## Function to compute the combination of n items taken r at a time ncomb = function(n,r) { nCr = factorial(n)/(factorial(r)*factorial(n-r)) return(nCr) } ### We use the functions n = 10 r = 6 np = npermute(n,r) nc = ncomb(n,r) print(paste("Permutation of ", n, " things taken ",r," at a time = ", np)) print(paste("Combination of ", n, " things taken ",r," at a time = ", nc))

here is the output:

[1] "Permutation of 10 things taken 6 at a time = 151200" [1] "Combination of 10 things taken 6 at a time = 210"