## Geometric distribution

The binomial distribution computes the probability of getting $x$ successes in a sequence of $n$ Bernoulli trials.We ask the following question: if $p$ is the probability of success in each trial, how many trials we have to perform until we observe the first success?

Let the first success occur in the $x^{th}$ trial. This $x$ follows a geometric distribution

Suppose we perform a sequence of Bernoulli trials and note down the $x^{th}$ trial when first success occurs.

For example, let ’S’ denote success and ’F’ denote failure in a Bernoulli experiment. We perform the experiment many times to get a sequence FFFFSFSS.... Here the first success occurs on fifth trial and hence $x = 5$. If we repeat these trials again and get a sequence FFSFSSFFSF...., we have $x=3$.

If we repeat this experiment many many times, what will be distribution of $x$?.

We will derive an expression for the probability distribution function of geometric distribution as follows.

If $p$ is the probability of success and $1-p$ is the probability of failure in a single Bernoulli trial,

$\small{P(x)~=~P(first~success~on~trial~x) }$
$\small{~~~~~~~~=~P(first~x-1~trials~result~in~failure~and~x^{th}~trial~a~success) }$
$\small{~~~~~~~~=~P(first~x-1~trials~result~in~failure) \times P(x^{th}~trial~a~success) }$
$\small{~~~~~~~~=~(1-p)^{x-1} \times p }$

Therefore, the probability density function of geometric distribution that gives the probability of observing the first success on $x^{th}$ trial, with $p$ being the probability of success for each trial is given by ,
$\small{P_{ge}(x) = p(1-p)^{x-1}}~~~~~~~~~~for~~x = 1,2,3,4,...$

#### Why the name "Geometric distribution"?

The Geometric series is given by,
$~~~~~~~~~~~~~~~\small{ \sum\limits_{k=0}^n ar^k~=~a + ar + ar^2 + ar^3 + ar^4 + ....}$
where the series converges for $\small{-1 \leq r \leq 1 }$.

Now consider the summation of the geometric distribution expression with $k = x-1$:
$~~~~~~~~\small{\sum\limits_{x=1}^n p(1-p)^{x-1} = \sum\limits_{k=0}^{n-1} p(1-p)^{k} }$
With p=a, 1-p = r and n-1 = m, the above expression resembles a geometric progression $\small{\sum\limits_{k=0}^m ar^k }$. hence the name geometric distribution

#### The mean and variance of the Geometric distribution

The expressions for the mean and the variance of the geometric distribution are given below (derivation not shown):

$~~~~~~~~~~\small{\mu = \dfrac{1}{p} }~~~~~~$

$~~~~~~~~~~\small{\sigma^2 = \dfrac{1-p}{p^2} }~~~~~~$

#### The plot of Geometric probability distribution

The figure below shows the probability density plots of geometric distribution for various values of probability of success $p$.

Example-1 : About $\small{10\%}$ of mangos in a fruit basket are not ripe. If we randomly select 6 mangos from this basket, what is the probability that the first five are ripe and the sixth one is unripe?

We apply geometric distribution with $p=0.1$ and $x=6$.

$\small{P_{ge} = p(1-p)^{x-1} = 0.1\times (1-0.1)^{6-1} = 0.059 }$

Thus, there is s $\small{59\%}$ chance that the first 5 mangos we pick may turn out to be ripe ones until we pick an unriped mango as sixth one.

## R scripts

The R statistics library provides the following four basic functions for the geometric distribution.



x  = trial number at which the first success is observed
(ie., first success after x-1  successive failures)

p  =  probability of success in a trial

dgeom(x,p)  ----->  Returns the probability density for success in trial number x.

pgeom(x,p)  ----->  Returns the cumulative geometric probability for x=1 upto value of x.

qgeom(pvalue, p)  -----> Inverse of the pgeom() function.
Returns the x value upto which the cumulative probability is pvalue (quantiles).

rgeom(n, p)  ----->  Returns n random deviates from a hypergeometric distribution
with the probability of success p.




### Generating the probability density function of geometric distribution

x = seq(1,10)
p = 0.3
y =  dgeom(x,p)
plot(x,y,type="h", col="red", lwd=2, xlab="Trial number x that resulted in first success", ylab = "Geometric probability for x", font.lab=2, main="Probability density for geometric distribution")

## Computing cumulative probability upto x=4
p = 0.2
x = 4
prob = pgeom(x,p)
print(paste("Cumulative probability of geometric distribution upto x=4 = ", round(prob, digits=3)))

## Computing value of x at which cumuative probability crosses q
p = 0.2
pcumul = 0.738
xval = qgeom(pcumul, p)
print(paste("trial number x value at which cumulative probability crosses value 0.738 value = ", xval))

## Generating 6 random deviates from geometric distribution
p = 0.4
x = rgeom(6, p)
print("some random deviates from geometric distribution with p=0.4 : ")
print(round(x, digits=3))



Running the above script in R prints the following output lines and graph on the screen:


 "Cumulative probability of geometric distribution upto x=4 =  0.672"
 "trial number x value at which cumulative probability crosses value 0.738 value =  6"
 "some random deviates from geometric distribution with p=0.4 : "
 5 2 2 0 1 3