## Confidence interval on population mean

According to the Central Limit Theorem, the sample mean $\small{\overline{x}}$ estimated from n random samples from a distribution will be unit normal under the tranformation,

$~~~~~~~~~~~~~~~~~\small{Z = \dfrac{\overline{x} - \mu}{\left(\dfrac{\sigma}{\sqrt{n}}\right)} ~~~is~~~N(0,1) }$

See figure below:

For a given set of values for $\small{\overline{x}, \mu~and~\sigma}$, let $\small{\alpha}$ be the probability of getting Z values above and below a threshold.

Let, $Z_{1-\alpha/2} ~$ be the Z value above which the area under the curve is $\small{\dfrac{\alpha}{2} }$. This means, $~\small{P(Z \gt Z_{1-\alpha/2}) = \dfrac{\alpha}{2}}$

Similarly, $-Z_{1-\alpha/2} ~$ be the Z value below which the area under the curve is $\small{\dfrac{\alpha}{2} }$. ie., $~\small{P(Z \lt -Z_{1-\alpha/2}) = \dfrac{\alpha}{2}}$

The probability for a Z value in the range $\small{[-Z_{1-\alpha/2}, Z_{1-\alpha/2}]}$ is $\small{1-\alpha }$. Therefore, we write this as,

$~~~~~~~~\small{P(-Z_{1-\alpha/2} ~\leq ~Z \leq ~Z_{1-\alpha/2}) = 1-\alpha }$

Substituting for Z the expression of Z tranform, we get

$~~~~~~~~\small{P(-Z_{1-\alpha/2} ~\leq ~\dfrac{\overline{x} - \mu}{\left(\dfrac{\sigma}{\sqrt{n}}\right)} \leq ~Z_{1-\alpha/2}) = 1-\alpha }$

We now consider the following inequality :

$~~~~~~~~\small{ -Z_{1-\alpha/2} ~\leq ~\dfrac{\overline{x} - \mu}{\left(\dfrac{\sigma}{\sqrt{n}}\right)} \leq ~Z_{1-\alpha/2} }$

Multiplying throughout by $\small{\dfrac{\sigma}{\sqrt{n}} }$, we get

$~~~~~~~~\small{ -Z_{1-\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right) ~\leq ~\overline{x} - \mu \leq ~Z_{1-\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right) }$

We add $\small{-\overline{x}}$ throughout to get

$~~~~~~~~\small{-\overline{x} -Z_{1-\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right) ~\leq ~ -\mu ~ \leq ~-\overline{x} + Z_{1-\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right) }$

Multiplying throught by -1, we can reverse the inequality to get:

$~~~~~~~~\small{\overline{x} + Z_{1-\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right) ~\geq ~ \mu ~ \geq ~\overline{x} - Z_{1-\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right) }$

reading this from left, we get the final result

$~~~~~~~~\small{\overline{x} - Z_{1-\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right) ~\leq ~ \mu ~ \leq ~\overline{x} + Z_{1-\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right) }$

With the above inequality, we can make this final statement:

$~~~~~~~~\small{P\left(\overline{x} - Z_{1-\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right) ~\leq ~ \mu ~ \leq ~\overline{x} + Z_{1-\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right)\right) = 1 - \alpha }$

The above result is a remarkable one. It says the following:

Once the sample set X is measured and sample mean $\small{\overline{x}}$ is computed, we can construct an interval $\small{\overline{x} \pm Z_{1-\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right)}$ around this mean.

We can say with $\small{(1-\alpha)\times 100\% }$ confidance that the unknown population mean $\small{\mu }$ is in this interval, provided we know the population standard deviation $\small{\sigma }$.

The interval
$\small{\overline{x} \pm Z_{1-\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right)}$

is called the Confidence interval on the population mean

For example, for a given data set, let us fix $\small{\alpha = 0.05}$. Then $~~\small{\dfrac{\alpha}{2} = 0.025 }$ and $~~\small{1 - \dfrac{\alpha}{2} = 0.975 }$. We thus have,

$\small{\overline{x} \pm Z_{0.975} \dfrac{\sigma}{\sqrt{n}}}$ as the $\small{95\% }$ confidence interval around the estimated mean $\small{\overline{x} }$. We have to get $\small{Z_{0.975} }$ from Gaussian table.

### Understanding the confidence interval on population mean

Suppose we randomly draw $\small{n}$ data points from a distribution with a given $\small{\mu}$ and $\small{\sigma}$. We compute, say, a $\small{95\%}$ condifence interval on population mean for this sample data. The population mean $\small{\mu}$ may or may not be within this interval. Now we repeat this whole exercise m times (we call it "m experiments") where m is large. Everytime we pick n random samples, get the sample mean, compute the $\small{95\%}$ confidence interval and check whether the population mean $\small{\mu}$ is inside the interval . Note that every time, the population mean will be different, and hence the confidence interval. We expect that out of m such experiments, $95\%$ of the cases will have the population mean within the confidence interval and $5\%$ of them will have population mean outside the interval.

The confidence interval does not say that $\small{\mu}$ assumes a value within the interval with a probability 0.95. It does not say anything about the value of $\small{\mu}$.

### Two sided and One sided confidence intervals

The confidence interval $\small{\overline{x} \pm Z_{1-\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right)}$ gives the $\small{(1-\alpha)100\%}$ confidence with which we say that the population mean is in the given interval.

This expression gives a two sided confidence interval, since the given $\small{\alpha}$ is split between the lower and upper bounds of the interval.

Sometimes, we want to estimate only the lower or upper bound within which $\small{\mu}$ can be located with $\small{(1-\alpha)100\%}$ confidence. In this case, the entire probability $\small{\alpha}$ is assigned to one side.

An $\small{(1-\alpha)100\%}$ upper one sided confidence interval on $\small{\mu}$ is written as, $\small{\overline{x} + Z_{1-\alpha}\left(\dfrac{\sigma}{\sqrt{n}}\right)}$

Similarly, we can write a $\small{(1-\alpha)100\%}$ lower one sided confidence interval on $\small{\mu}$ as, $\small{\overline{x} - Z_{1-\alpha}\left(\dfrac{\sigma}{\sqrt{n}}\right)}$

Thus, in the case of one sided interval, we compute $\small{Z_{1-\alpha}}$ instead of $\small{Z_{1-\alpha/2}}$.

Example-1 :

In a food processing unit, a packaging machine prepares 53 gram packages of chocloate chips. In order to check the quality of packing, a ransom sample of 10 packages were pulled out from the assembli line and their weights were independently measured. The data is given below:

$~~~~~~~~~~\small{56.95, 57.54, 58.58, 56.13, 58.48, 57.06, 60.93, 59.30, 53.57, 59.46 }$

Assuming that the weight of these packets follow a normal distribution with $\small{N(\mu, 4.1) }$, find the $\small{95\% }$ confidence interval on $\small{\mu }$.

We estimate the mean of the data points as, $\small{\overline{x} = 57.8 }$

We have, $\small{n=10, \mu=4.1 }$

For a $\small{95\%}$ confidence interval, $\small{\alpha = 0.05 }$ and $\small{\dfrac{\alpha}{2} = 0.025 }$

For a two sided confidence interval with $\small{\alpha = 0.05 }$, we get, from Gaussian table, $\small{Z_{0.975} = 1.96 }$.

With this, we write the two sided confidence interval as, $\small{57.8 \pm 1.96 \times \dfrac{4.1}{\sqrt{10}} = 57.8 \pm 2.54 }$