Mathematical tools for natural sciences

Suppose we perform a sequence of Bernoulli trials and count the number of successess and failures.
If \(\small{p}\) is the probability of success in any trial, ** how many trials we have to perform until we observe \(\small{r}\) successes?.**

Here we are interested in a random variable \(\small{X}\) that represents the **number of trials
needed to observe the \(\small{r^{th}}\) success**. Thus we may observe \(\small{r=5}\) successes in
\(\small{X=8}\) trials or in \(\small{X=11}\) trials or in \(\small{X=9}\) trials. X is the trial number in which \(\small{r^{th} }\) success is observed, with \(\small{X \geq r }\). The observed values of the variable X follow the **negative binomial distribution **.

Another version of the negative binomial distribution is written in terms of the ** number of failures y before \(\small{r^{th}}\) success. **

We will now derive expression for both the probability distribution for both the vairables \(\small{ x}\) and \(\small{ y}\) :

** Derivation of the negative binomial probability:**

\(\small{P_{nb}(x,p,r) = P(x~trials~for~r~successes)}\)

\(~~~~~~~~~~~~~~~~=~P(r-1~successes~in~the~first~x-1~trials) \times P(success~on~r^{th}~trial) \)

We have,

\( \small{P(r-1~successes~in~the~first~x-1~trials)~=~_{x-1}C_{r-1} p^{als~r-1} (1-p)^{x-r}~~~~~(binomial~distribution) }\)

\(\small{P(success~on~r^{th}~trial) = p }\)

We can thus write,

\( P_{nb}(x,p,r) = \small{P(x~trials~for~r~successes)~=~_{x-1}C_{r-1} p^{r-1} (1-p)^{x-r} \times p~=~_{x-1}C_{r-1} p^{r} (1-p)^{x-r} } \)

** Version 2 : Number of failures \(\small{y}\) before the \(\small{r^{th}}\) success. **

Since \(\small{x }\) is the total number of trials before \(\small{r^{th}}\) success,

\(~~~~\small{y = number~of~failures~before~r^{th}~success = x-r}\)

We have,\(~~~~~~~~~ \small{P_{nb}(x,p,r)~=~_{x-1}C_{r-1} p^r (1-p)^{x-r} }\)

Substituting from \(\small{y = x-r }\) for \(\small{x}\) in the above expression, we get

\(~~~~~~~~~~\small{P_{nb}(y,p,r)~=~_{y+r-1}C_y p^r (1-p)^y }\)

The probability density functions for a negative binomial distribution for the ** number of trials \(\small{x}\) required for observing \(\small{r}\) successes** and the **number of failures \(\small{y}\) before the \(\small{r^{th}}\) success**, with \(\small{p}\) being the probability of success in one trial, are summarised below. The mean and variance of these two variables are also given, without derivation:

\(\small{P_{nb}(y,p,r)~=~_{y+r-1}C_y p^r (1-p)^y,~~~~~~~\mu_y = \dfrac{r(1-p)}{P},~~~~~~~~\sigma^2_y = \dfrac{r(1-p)}{p^2} }\)

For any real number \(\small{w}\), the expression \(\small{(1-w)^{-r} }\) with a negative exponent -r can be expanded as a sum using binomial theorem:

\( \small{ (1-w)^{-r}~=~\sum\limits_{x=r}^{\infty} {_{x-1}C_{r-1}} w^{x-r} }\)

The expression for the probability distribution given before is similar to the above negative binomial expansion except for the multiplicative constant \(\small{p^r}\). Hence the name * negative binomial distribution *.

The figure below displays the plots of negative binomial probability distribution for various values of probability \(\small{p}\) of success and the number of failures \(\small{r}\).

If we assign success to finding a child with cesarean birth, this is a negative binomial problem with probability of 12 trials required for 5 successes, with p=0.09 being the probability of success in a trial.

\( \small{P(12~trials~for~5~successes) = _{12-1}C_{5-1} (0.09)^{5} (1-0.09)^{12-5} = _{11}C_{4} (0.09)^{5} (0.91)^{7} = 0.001}\)

Another way of looking at this problem is to consider encountering 12 births for 5 scissarians as 7 non-scissarians before encountering 5 scissarian births. Then, if we take observing scissarian as success and non-scissarian as failure, we can apply second formula with \(\small{y=7}\), \(\small{r=5}\) and \(\small{p=0.09}\) to get,

\( \small{P(observing~7~failures~before~5~successes) = _{y+r-1}C_y p^r (1-p)^y = _{7+5-1}C_7 (0.09)^5 (1-0.09)^7 = 0.001}\)

We will now learn to compute the negative binomial probability distribution in R

The usage of these four functions are demonstrated in the R script here. With comments, the script lines are self explanatory:

## Negative binomial distribution functions in R ## p = probability of success in a trial ## r = number of successes observed ## y = number of failures before r successes ## Computing negative binomial probability density for observing y failures before r successes: y = 5 r = 3 p = 0.3 prob = dnbinom(y, r, p) print(paste("Negative binomial probability for " , y , " failures before " , r , " successes is ",prob)) ### Function for computing cumulative probability from x=0 to a given value. ### pnbinom(x,r,p) gives cumulative probability for a negative binomial distribution from 0 to x. x = 5 cumulative_probability = pnbinom(x,r,p) print(paste("cumulative negative binomial probability from x=0 to", x," for ", x," failures before ",r, " successes is ", cumulative_probability) ) ### Function for finding x value corresponing to a cumulative probability cumulative_probability = 0.5 x_value = qnbinom(cumulative_probability, r, p) print(paste("number of failures corresponding to the given cumulative binomial probability of ",cumulative_probability," is ",x_value) ) ### Function that returns 4 random deviates from a Negative Binomial distribution of given (n,p) deviates = rnbinom(4, r, p) print(paste("4 binomial deviates : ")) print(deviates) ### We plot a binomial density distribution using dnbinom() r = 10 p = 0.3 y = seq(0,60) pdens = dnbinom(y,r,p) plot(y,pdens, type="h", col="red", xlab = "Negative Binomial variable y", ylab="Negative binomial probability", main="Negative binomial distribution (r=10, p=0.3)") ## We generate frequency histogram of binomial deviates using rbinom() r = 10 p = 0.3 xdev= rnbinom(10000, r, p) plot(table(xdev), type="h", xlab="Negative binomial variable y", ylab="frequency",main="Distribution of negative binomial random deviates (r=10,p=0.3)")

Executing the above script in R prints the following results and figures of probability distribution on the screen:

[1] "Negative binomial probability for 5 failures before 3 successes is 0.09529569" [1] "cumulative negative binomial probability from x=0 to 5 for 5 failures before 3 successes is 0.44822619" [1] "number of failures corresponding to the given cumulative binomial probability of 0.5 is 6" [1] "4 binomial deviates : " [1] 6 4 2 5