confidence interval for the variance

Biostatistics with R

The confidence interval for the population variance

In an earlier section, while sampling from a normal distribution, we learnt to construct a confidence interval on a population mean based on the sample mean.

Once the sample set X of size n is measured and sample mean $\small{\overline{x}}$ is computed, we can construct an interval $\small{\overline{x} \pm Z_{1-\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right)}$ around this mean.

We can say with $\small{(1-\alpha)\times 100\% }$ confidance that the unknown population mean $\small{\mu }$ is in this interval, provided we know the population standard deviation $\small{\sigma }$.

Arguing on similar lines, a $\small{100(1-\alpha)\%}$ confidence interval for the population variance $\small{\sigma^2 }$ can be defined around the sample variance $\small{s^2 }$ based on n data points from a normal distribution.

The sample variance is computed from the n data points using,
$$ \small{ s^2 = \dfrac{1}{n-1}\sum_{i=1}^n (X_i - \overline{X})^2 } $$ From the properties of the Chi-square distribution, we know that the quantity $\small{\dfrac{(n-1)s^2}{\sigma^2} }$ follows $\small{\chi^2(n-1) }$, a Chi-square with (n-1) degrees of freedom.

On a Chi-square distribution with (n-1) degrees of freedom, we choose two points $\small{a}$ and $\small{b }$ for the Chi-square variable such that the area under the curve(probability) between these two points is $\small{1-\alpha}$.

This means,$~~\small{b = \chi^2_{1-\alpha/2}(n-1) }~~$ and $~~\small{a = \chi^2_{\alpha/2}(n-1)~~ }$ (ie., the area from 0 to $\small{b}$ is $\small{1-\alpha/2 }$ and the area from 0 to $\small{a}$ is $\small{\alpha/2 }$. The area in the two tail regions are equal to $\small{\alpha/2 }$. Thus the area under $\small{\chi^2(n-1)}$ curve between points $\small{a}$ and $\small{b}$ is $\small{1-\alpha }~~$. See the figure below

We can therefore write,

$~~~~~~~~~~\small{1~-~\alpha~~=~~P\left( a~\leq~\dfrac{(n-1)s^2}{\sigma^2}~\leq~b \right) }$

$~~~~~~~~~~~~~~~~~~~~~~\small{=~P\left(\dfrac{a}{(n-1)s^2}~\leq~\dfrac{1}{\sigma^2}~\leq~\dfrac{b}{(n-1)s^2} \right) }$

$~~~~~~~~~~~~~~~~~~~~~~\small{=~P\left(\dfrac{(n-1)s^2}{b}~\leq~\sigma^2~\leq~\dfrac{(n-1)s^2}{a} \right) }$
Thus the probability that the random interval $\small{\left[\dfrac{(n-1)s^2}{b},~\dfrac{(n-1)s^2}{a} \right]}$ contains the unknown parent vairance $\small{\sigma^2}$ is $\small{1-\alpha}$
Once we estinate the sample variance $\small{s^2}$ from n data points, we can construct a $\small{100(1-\alpha)\% }$ confidence interval for the unknown population variance $\small{\sigma^2 }$ as
$~~~~~~~~~~~~\small{\left[\dfrac{(n-1)s^2}{b},~\dfrac{(n-1)s^2}{a} \right]}$

From this it follows that a $\small{100(1-\alpha)\% }$ confidence interval for the population standard deviation $\small{\sigma }$ is given by,
$~~~~~~~~~~~~\small{\left[\sqrt{\dfrac{n-1}{b}}s,~\sqrt{\dfrac{(n-1)}{a}}s \right]}$

Example-1 : A random sample of the weight og 14 male rats in a laboratory were made. The data of their weights in grams is reported here:

$~~~~~~~\small{ 345.3, 398.4, 391.3, 450.3, 446.7, 393.1, 342.4 }$
$~~~~~~~\small{ 401.0, 429.6, 427.8, 446.4, 438.7, 374.2, 423.2 }$

Assuming that the weight of the rat follows a normal distribution, estimate a $\small{95\%}$ confidence interval for the population standard deviation.

Solution :
Since we have n=14 data points, the degrees of freedom n-1 = 13.
Also $\small{\alpha = 0.05}$ and hence $\small{\alpha/2 = 0.025 }$

From the given data, we estimate $\small{s^2 = 35.9 }$ Now we should get the values of critical points a and b for a $\small{ 95\% }$ confidence level. This Chi-square table gives the area under the curve (probability) above a critical value. From this, we read, $\small{a = \chi^2_{0.975}(13) = 5.009 }$ and $\small{b = \chi^2_{0.025}(13) = 24.736 }$.

Therefore, the $\small{95\%}$ confidence interval for the unknown population standard deviation $\small{\sigma }$ is given by,
$\small{ \left[\sqrt{\dfrac{13}{24.736}}\times 35.9,~~ \sqrt{\dfrac{13}{5.009}}\times 35.9\right] ~~ = ~~ \left[ 26.0, 57.8 \right] }$

CountBio

Mathematical tools for natural sciences

Biostatistics with R

The confidence interval for the population variance