Biostatistics with R

Gamma distribution

In the previous section, we learnt that the waiting time until the first event in a Poission process follows an exponential distribution. Now we are interested in the waiting time \(\small{W}\) until certain number of events arrive.

let \(\small{W}\) be the waiting time until \(\small{\alpha}\) events arrive in a Poisson process with parameter \(\small{\lambda}\) (mean number of events per unit time). What will be the distribution of the continuous variable \(\small{W}\)?

Since the mean number of events that arrive in a time interval \(\small{w}\) is \(\small{\lambda w}\), we can write the Poisson probability for the arrival of \(\small{k }\) events in time \(\small{w}\) as \(\small{ \dfrac{(\lambda w)^k {\large e}^{-\lambda w}}{k!} }\)

For the arrival of \(\small{\alpha}\) events, we write the cumulative probability of having a waiting time upto \(\small{w}\) as,

\(~~~~~~~~\small{g(w)~=~P(W \leq w) = 1 - P(W \gt w)}\)

\(\small{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~=~1 - P(fewer~than~\alpha~events~arrive~in~interval~[0,w])} \)

\(\small{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~=~1 - \sum\limits_{k=0}^{\alpha-1} \dfrac{(\lambda w)^k {\large e}^{-\lambda w}}{k!} }\)

The probability distribution function (pdf) is the derivaive of the cumulative probability function. Skipping the inermediate steps, we directly write the derivative of cumulative probability distribution as,

\(\small{P_g(\lambda,w,\alpha) = \dfrac{d(g)}{dw} = \dfrac{\lambda (\lambda w)^{\alpha-1}}{(\alpha-1)!} {\large e}^{-\lambda w } }~~~~~~~~~~~\) where \(\small{w \gt 0 }\)




The above expression gives the probability of waiting time \(\small{w}\) until the arrival of \(\small{\alpha}\) events in a Poisson process in which the mean number of events per unit time is \(\small{\lambda}\)

The above probability distribution is said to be of Gamma distribution . this name is derived from the gamma function defined by,

\(~~~~~~~~~~~~~~~~~\small{\Gamma(t) = \int\limits_0^\infty y^{t-1}e^{-y}dy~~~~~~~~~~~~t \gt 0}\)

The gamma function has an important property : \( \small{~~~~~\Gamma(n) = (n-1)! } \)

Using this result in the pression of \(\small{P_g}\) given above and letting \(\small{\theta = \dfrac{1}{\lambda} }\), we get the pdf of gamma distribution as,

\(~~~~~~~~~~~~~~~~\small {f(x) = \dfrac{1}{\Gamma(\alpha) \theta^\alpha} {\large x}^{\alpha-1}{\large e}^{-x/\theta},~~~~~~~~~~~0 \leq x \lt \infty }\)







Thus, in a Poisson process with \(\small{\lambda}\) number of events per unit time, the waiting time \(\small{x}\) until the arrival of \(\small{\alpha^{th} }\) event follows a Gamma distribution with parameters \(\small{\alpha}\) and \(\small{\theta = \dfrac{1}{\lambda} }\).

Here, \(\small{\theta}\) is the waiting time until the first event, and \(\small{\alpha}\) is the number of events for which we are waiting to occur.

\(\small{\alpha}\) is called the shape parameter and \(\small{\theta}\) the scale parameter of the distribution. (See the plots of the PDF below).




The mean and variance of the gamma distribution are given by,

\(~~~~~~~~~~~~~~~~~~~~~\small{\mu = \dfrac{\alpha}{\lambda} = \alpha \theta ~~~~~~~~~}\)

\(~~~~~~~~~~~~~~~~~~~~~\small{\sigma^2 = \dfrac{\alpha}{\lambda^2} = \alpha \theta^2 ~~~~~~~~~~}\)









The plot of the gamma distribution



Example-1 : In the emergency ward of a city hospital, on an average 1 case is admitted every hour. Compute the probability that we have to wait 6 hours to get 4 cases.

The mean number of cases per hour follows a Poisson process with mean events \(\small{\lambda = 1}\) per hour. We have the waiting time \(\small{x = 6}\) hours until the arrival of \(\small{\alpha th}\) event, where \(\small{\alpha = 4}\).

One case every hour means we have a waiting time of \(\small{\theta = \dfrac{1}{\lambda} = 1}\) hour until the firat event.

With this information, we can compute the probability of waiting time of 6 hours until we have the arrival of \(\small{\alpha=4 }\) cases as,
$$\small{P(0 \le x \lt 6) = \int_0^6 \dfrac{1}{\Gamma(\alpha) \theta^\alpha} {\large x}^{\alpha-1}{\large e}^{-x/\theta}dx = \int_0^6 \dfrac{1}{\Gamma(4) 1^4} {\large x}^{4-1}{\large e}^{-x/1}dx = 0.8489 } $$,
Here, we have used the R function call pgamma(6, shape=4, scale=1) for computing the above integral. This method is explained in the R sceipt below:


R-scripts

R provides functions for computing gamma distribution with probability density \(~~~~~~~~~~~~~~~~\small {f(x) = \dfrac{1}{\Gamma(\alpha) \theta^\alpha} {\large x}^{\alpha-1}{\large e}^{-x/\theta},~~~~~~~~~~~0 \leq x \lt \infty }\)
where \(\small{\alpha}\) is the scale parameter and \(\small{\theta}\) is the shape parameter.


 dgamma(x, shape, scale) --------------> returns the gamma probability density for a given x value, shape and           
                                            scale parameters 
                            
 pgamma(x, shape, scale) --------------> returns the cumulative probability from 0 upto x from a 
                                            gamma distribution with the given shape and scale paramters.


 qgamma(p, shape, scale) ---------------> returns the  x value at which the cumulative probability is p from a 
                                               gamma distribution with the given scale and shape parameters.

 rgamma(n, shape, scale) ---------------> returns n random numbers in the range [0 , infinity] from a                                                                                
                                            gamma distribution with the given scale and shape parameters.



The R script below demonstrates the usage of the above mentioned functions:


##### Using R library functions for Gamma distribution ## Probability density for a given x, from a distribution with shape and scale values: prob_dens = dgamma(x=6, shape=4, scale=1) prob_dens = format(prob_dens, digits=4) print(paste("Gamma probability density for x=6, scale=4, shape=1 is = ", prob_dens)) ## Cumulative probability upto x=6 for a gamma pdf with the given scale and shape values. cum_prob = pgamma(q=6, shape=4, scale=1) ## function uses 'q' for x value cum_prob = format(cum_prob, digits=4) print(paste("gamma cumulative probability upto x=6 for scale=4, shape=1 is = ", cum_prob)) ## The value of variable x upto which the cumulative probability is p, for a ## gamma distribution with given shape and scale parameters x = qgamma(p = 0.85, shape=4, scale=1) x = format(x, digits=4) print(paste("varable x at which cumulative probability is 0.85 is = ", x)) ## Generate 5 random deviates from a gamma distribution with parameters shape=4, scale=1 print(rgamma(n=6, shape=4, scale=1)) ## we will draw 2 graphs on the same plot par(mfrow=c(2,1)) ## Drawing a gamma distribution pdf in x = 0 to 12, with scale=1, shape=4 x = seq(0,12, 0.5) gamma_pdf = dgamma(x, shape=4, scale=1) plot(x, gamma_pdf, col="blue", type="b", xlab="Gamma varbale x", ylab="Gamma PDF") text(8.0, 0.2, "shape = 4, scale=1") ## Plotting the frequency histogram of gamma random deviate for shape=4, scale=1 hist(rgamma(n=10000, shape=4, scale=1), breaks=30, col="red", xlab="gamma variable x", ylab="frequency", main = " ") text(11, 800, "shape = 4, scale = 1")


Executing the above code prints the following lines and displays the following plots on the screen :


[1] "Gamma probability density for x=6, scale=4, shape=1 is = 0.08924" [1] "gamma cumulative probability upto x=6 for scale=4, shape=1 is = 0.8488" [1] "varable x at which cumulative probability is 0.85 is = 6.014" [1] 1.593189 2.012809 2.392053 2.754715 2.400868 3.282750