Biostatistics with R

The confidence interval for the population variance

In an earlier section, while sampling from a normal distribution, we learnt to construct a confidence interval on a population mean based on the sample mean.


Once the sample set X of size n is measured and sample mean \(\small{\overline{x}}\) is computed, we can construct an interval \(\small{\overline{x} \pm Z_{1-\alpha/2}\left(\dfrac{\sigma}{\sqrt{n}}\right)}\) around this mean.

We can say with \(\small{(1-\alpha)\times 100\% }\) confidance that the unknown population mean \(\small{\mu }\) is in this interval, provided we know the population standard deviation \(\small{\sigma }\).

Arguing on similar lines, a \(\small{100(1-\alpha)\%}\) confidence interval for the population variance \(\small{\sigma^2 }\) can be defined around the sample variance \(\small{s^2 }\) based on n data points from a normal distribution.

The sample variance is computed from the n data points using,
$$ \small{ s^2 = \dfrac{1}{n-1}\sum_{i=1}^n (X_i - \overline{X})^2 } $$ From the properties of the Chi-square distribution, we know that the quantity \(\small{\dfrac{(n-1)s^2}{\sigma^2} }\) follows \(\small{\chi^2(n-1) }\), a Chi-square with (n-1) degrees of freedom.

On a Chi-square distribution with (n-1) degrees of freedom, we choose two points \(\small{a}\) and \(\small{b }\) for the Chi-square variable such that the area under the curve(probability) between these two points is \(\small{1-\alpha}\).

This means,\(~~\small{b = \chi^2_{1-\alpha/2}(n-1) }~~\) and \(~~\small{a = \chi^2_{\alpha/2}(n-1)~~ }\) (ie., the area from 0 to \(\small{b}\) is \(\small{1-\alpha/2 }\) and the area from 0 to \(\small{a}\) is \(\small{\alpha/2 }\). The area in the two tail regions are equal to \(\small{\alpha/2 }\). Thus the area under \(\small{\chi^2(n-1)}\) curve between points \(\small{a}\) and \(\small{b}\) is \(\small{1-\alpha }~~\). See the figure below

We can therefore write,

\(~~~~~~~~~~\small{1~-~\alpha~~=~~P\left( a~\leq~\dfrac{(n-1)s^2}{\sigma^2}~\leq~b \right) }\)

\(~~~~~~~~~~~~~~~~~~~~~~\small{=~P\left(\dfrac{a}{(n-1)s^2}~\leq~\dfrac{1}{\sigma^2}~\leq~\dfrac{b}{(n-1)s^2} \right) }\)

\(~~~~~~~~~~~~~~~~~~~~~~\small{=~P\left(\dfrac{(n-1)s^2}{b}~\leq~\sigma^2~\leq~\dfrac{(n-1)s^2}{a} \right) }\)
Thus the probability that the random interval \(\small{\left[\dfrac{(n-1)s^2}{b},~\dfrac{(n-1)s^2}{a} \right]}\) contains the unknown parent vairance \(\small{\sigma^2}\) is \(\small{1-\alpha}\)
Once we estinate the sample variance \(\small{s^2}\) from n data points, we can construct a \(\small{100(1-\alpha)\% }\) confidence interval for the unknown population variance \(\small{\sigma^2 }\) as
\(~~~~~~~~~~~~\small{\left[\dfrac{(n-1)s^2}{b},~\dfrac{(n-1)s^2}{a} \right]}\)

From this it follows that a \(\small{100(1-\alpha)\% }\) confidence interval for the population standard deviation \(\small{\sigma }\) is given by,
\(~~~~~~~~~~~~\small{\left[\sqrt{\dfrac{n-1}{b}}s,~\sqrt{\dfrac{(n-1)}{a}}s \right]}\)



Example-1 : A random sample of the weight og 14 male rats in a laboratory were made. The data of their weights in grams is reported here:

\(~~~~~~~\small{ 345.3, 398.4, 391.3, 450.3, 446.7, 393.1, 342.4 }\)
\(~~~~~~~\small{ 401.0, 429.6, 427.8, 446.4, 438.7, 374.2, 423.2 }\)

Assuming that the weight of the rat follows a normal distribution, estimate a \(\small{95\%}\) confidence interval for the population standard deviation.


Solution :
Since we have n=14 data points, the degrees of freedom n-1 = 13.
Also \(\small{\alpha = 0.05}\) and hence \(\small{\alpha/2 = 0.025 }\)

From the given data, we estimate \(\small{s^2 = 35.9 }\) Now we should get the values of critical points a and b for a \(\small{ 95\% }\) confidence level. This Chi-square table gives the area under the curve (probability) above a critical value. From this, we read, \(\small{a = \chi^2_{0.975}(13) = 5.009 }\) and \(\small{b = \chi^2_{0.025}(13) = 24.736 }\).

Therefore, the \(\small{95\%}\) confidence interval for the unknown population standard deviation \(\small{\sigma }\) is given by,
\(\small{ \left[\sqrt{\dfrac{13}{24.736}}\times 35.9,~~ \sqrt{\dfrac{13}{5.009}}\times 35.9\right] ~~ = ~~ \left[ 26.0, 57.8 \right] }\)