Mathematical tools for natural sciences

Consider two normal distributions \(\small{N(\mu_X, \sigma_X^2)}\) and \(\small{N(\mu_Y, \sigma_Y^2)}\). We are interested in comparing the population variance \(\small{\sigma_X^2}\) and \(\small{\sigma_Y^2}\).

Suppose we randomly draw n and m samples from these two distributions X and Y respectively and estimate their sample variances \(\small{S_X^2}\) and \(\small{S_Y^2}\). Using these estimates, we can derive a confidence interval for the ratio \(\small{\dfrac{\sigma_X^2}{\sigma_Y^2}}\) as follows:

According to the definition of the F distribution, the ratio \(\small{ \dfrac{S_Y^2/\sigma_Y^2}{S_X^2/\sigma_X^2} }\) follows an F distribution with \(\small{r_1 = m-1}\) and \(\small{r_2 = n-1 }\) degrees of freedom.

(Note : n is the number of samples in X and m is the number of samples in Y )

For a given significance \(\small{\alpha}\), choose two F values F1 and F2 on the F distribution such that the area under the F curve between \(\small{F_1}\) and \(\small{F_2}\) is \(\small{\alpha}\). We therefore let,

\(\small{ F_1 = F_{\alpha/2}(m-1,n-1)~~ }\) and \(~~\small{F_2 = F_{1-\alpha/2}(m-1, n-1) }\)

(The area under the curve from 0 to \(\small{F_1}\) is \(\small{\alpha/2}\) and from 0 \(\small{F_2}\) is \(\small{1-\alpha/2}\) )

and can write the inequality

\(~~~~~~~~~~~~\small{1 - \alpha~=~ P\left( F_1~\leq~\dfrac{S_Y^2/\sigma_Y^2}{S_X^2/\sigma_X^2}~\leq~F_2 \right) }\)

\(~~~~~~~~~~~~~~\small{~~~~~~~=~P\left(F_1 \dfrac{S_X^2}{S_Y^2}~\leq~\dfrac{\sigma_X^2}{\sigma_Y^2}~\leq~F_2 \dfrac{S_X^2}{S_Y^2} \right)}\)

\(~~~~~~~~~~~~~~\small{~~~~~~~=~P\left(F_{\alpha/2}(m-1,n-1) \dfrac{S_X^2}{S_Y^2}~\leq~\dfrac{\sigma_X^2}{\sigma_Y^2}~\leq~F_{1-\alpha/2}(m-1, n-1) \dfrac{S_X^2}{S_Y^2} \right)}\)

\(\small{\left[ F_{\alpha/2}(m-1,n-1) \dfrac{S_X^2}{S_Y^2}, ~~ F_{1-\alpha/2}(m-1, n-1) \dfrac{S_X^2}{S_Y^2} \right] }~~~~~\)

is a \(\small{100(1-\alpha)\%}\) confidence interval for the ration of proportions \(\small{\dfrac{\sigma_X^2}{\sigma_Y^2} }\).

It follows that the interval,

\(\small{\left[ \sqrt{F_{\alpha/2}(m-1,n-1)} \dfrac{S_X}{S_Y}, ~~ \sqrt{F_{1-\alpha/2}(m-1, n-1)} \dfrac{S_X}{S_Y} \right] }~~~~~\)

is a \(\small{100(1-\alpha)\%}\) confidence interval for the ration of proportion \(\small{\dfrac{\sigma_X}{\sigma_Y} }\) of population standard deviations.

__ Important Note : __
The above mentioned confidence interval for the ratio of variances are fully valid only when the two
underlying distributions are perfectly Gaussian. If the distributions are non-Gaussian, the estimate of
confidence intervals for the ratio of variances are not accurate and erro prone.

\(~~~~~~~~~~~~\small{ X = \{ 9.1, 12.5, 10.2, 9.5, 7.3, 5.6, 10.1, 13.0, 12.8, 9.0, 7.9, 7.7 \} }\)

\(~~~~~~~~~~~~\small{ Y = \{11.6, 21.0, 20.9, 7.1, 15.9, 15.6, 17.9, 10.3, 16.5, 17.4, 15.7, 17.1, 13.5, 12.7, 19.0 \} }\)

Find a two sided \(\small{95\% }\) confidence interval for the ratio of population variances \(\small{\dfrac{\sigma_X^2}{\sigma_Y^2 } }\)

From the data, n = 12, m = 15, \(\small{\S_X = 2.32 }\) and \(\small{\S_Y = 3.86~~ }\), \(~~\small{\dfrac{S_X}{S_Y} = 0.5998}~~~\), and \(\small{\dfrac{S_X^2}{S_Y^2} = 0.3597 }\)

Also, \(\small{\alpha = 0.05 }\) and hence \(\small{\alpha/2 = 0.025 }\)

We find \(\small{F_{\alpha/2}(m-1, n-1) = F_{0.025}(14, 11) = 0.3231 }\)

and, \(~~~\small{F_{1 - \alpha/2}(m-1, n-1) = F_{0.925}(14, 11) = 2.4048 }\)

With these numbers, we can write a \(\small{95\%}\) confidence interval for \(\small{\dfrac{\sigma_X^2}{\sigma_Y^2 } }\) as,

\(\small{\left[ F_{\alpha/2}(m-1,n-1) \dfrac{S_X^2}{S_Y^2}, ~~ F_{1-\alpha/2}(m-1, n-1) \dfrac{S_X^2}{S_Y^2} \right] ~ = ~[ 0.3231 \times 0.5998, 2.4048 \times 0.5992]~=~[0.194, 1.44 ] }\)

We note that the point estimate \(\small{\dfrac{S_X^2}{S_Y^2} = 0.3597} \) is outside this confidenc interval.