Biostatistics with R

Hypothesis testing the ratio of two population variances

Some times we would like to test the equality of population variances of two normal distributions from which random samples are drawn. For example, before applying the stuents t-test on two sets of samples, we need to test whetehr their variances are equal. Given that the two population means are equal, we may like to test whether any significance difference exists in the spread around the mean values.

Let twoindependent random variables X and Y be drawn from the normal distributions \(\small{N(\mu_X, \sigma_X^2)}\) and \(\small{N(\mu_Y, \sigma_Y^2) }\).

In order to test whetehr their populaion variances are equal, we randomly draw n samples from X and m samples from Y distributionsand compute their respective sample variance \(\small{S_X^2 }\) and \(\small{S_Y^2 }\).

From the properties of the Chi-square distribution, we know that the quantities \(\small{\dfrac{(n-1)S_X^2}{\sigma_X^2} }\) and \(\small{\dfrac{(n-1)S_Y^2}{\sigma_Y^2} }\)follow independent chi-square distributions \(\small{\chi^2(n-1) }\)and \(\small{\chi^2(m-1) }\) respectively.

Also, the ratio of variances of two normal variables gives rise to an F distribution given by,

\(~~~~~\small{F = \dfrac{ \left(\dfrac{S_X^2}{\sigma_X^2}\right) } { \left(\dfrac{S_Y^2}{\sigma_Y^2}\right) } = \dfrac { \dfrac{\left[ \dfrac{(n-1)S_X^2 } {\sigma_X^2 } \right]}{(n-1)} } { \dfrac{\left[ \dfrac{(n_Y-1)S_Y^2 } {\sigma_Y^2 } \right]}{(m-1)} } }~~~~~\) is an F distribution with (n-1) and (m-1) degrees of freedom.

Under the null hypothesis \(\small{H_0~:~\sigma_X^2 = \sigma_Y^2}\), we get the F statistic for our test :

\(\small{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~F~=~\dfrac{S_X^2}{S_Y^2}~=~F(n-1, m-1) }\)

We also make use of an important property of F distribution: If a avariable F follows \(\small{F(n-1,m-1)}\), then its reciprocal \(\small{\dfrac{1}{F} }\) follows \(\small{F(m-1, n-1)}\).

The three alternate hypothesis can be tested as follows:


1. Two sided alternate hypothesis

\(~~~~~~\small{H_0:~~\sigma_X^2~=~\sigma_Y^2 }\)
\(~~~~~~\small{H_A:~~\sigma_X^2~\ne~\sigma_Y^2 }\)
Reject the null hypothesis if \(~~\small{\dfrac{S_X^2}{S_Y^2} \ge F_{\alpha/2}(n-1,m-1 ) }~~\) or \(~~~~~~\small{\dfrac{S_Y^2}{S_X^2} \ge F_{\alpha/2}(m-1,n-1 ) }\)


2. One sided alternate hypothesis

\(~~~~~~\small{H_0:~~\sigma_X^2~\le~\sigma_Y^2 }\)
\(~~~~~~\small{H_A:~~\sigma_X^2~\gt~\sigma_Y^2 }\)
Reject the null hypothesis if \(~~\small{\dfrac{S_X^2}{S_Y^2} \ge F_{\alpha}(n-1,m-1 ) }~~\)


3. One sided alternate hypothesis

\(~~~~~~\small{H_0:~~\sigma_X^2~\ge~\sigma_Y^2 }\)
\(~~~~~~\small{H_A:~~\sigma_X^2~\lt~\sigma_Y^2 }\)
Reject the null hypothesis if \(~~\small{\dfrac{S_Y^2}{S_X^2} \ge F_{\alpha}(m-1,n-1 ) }~~\)


Example-1 :

Two brands of an electric bulb claimed that their bulbs have a mean life time of 1000 hours. In order to test the claim, 16 bulbs of ech brand were randomly drawn and their life time in hours wwere determined by experiment. The results are presented below:

Bulb-A : \(\small{1067.7, ~984.3,~ 998.8,~ 1025.9,~ 1060.9,~ 959.1,~ 1013.8,~ 1047.0,~ 987.8,~ 1051.0,}\)

\( \small{~~~~~~~~~~~~~~~~~885.2,~1049.5,~ 1098.2,~ 1001.5,~ 1011.1,~ 991.6}\)


Bulb-B : \(\small{ 957.6, 981.8, 1096.5, 984.4, 1074.3, 929.4, 1056.0, 1012.3, 1040.7, 1099.5,}\)

\( \small{~~~~~~~~~~~~~~~~~1006.1, 1064.3, 865.6, 944.4, 1091.8, 952.1}\)


Assuming that the life times of Bulb-A and Bulb-B follow Gaussian distributions \(\small{N(\mu_x, \sigma_x )}\) and \(\small{N(\mu_y, \sigma_y )}\) respectively, test the null hypothesis \(\small{\sigma_x^2 = \sigma_y^2 }\) aginst an alternative hypothesis \(\small{\sigma_x^2 \lt \sigma_y^2 }\) to a significant level of \(\small{\alpha = 0.05}\).


In the given data, let X denote data Bulb-A and Y denote Bulb-B data.

From the data sets, we compute \(\small{S_X = 50.4 }\) and \(\small{S_Y = 69.1}\).

We have the number of data points n=m=16.

We compute \(\small{~~\dfrac{S_Y^2}{S_X^2}~=~\dfrac{69.1^2}{50.4^2}~=~1.879 }\)

\(\small{F_{1-\alpha}(m-1,n-1)~=~F_{0.95}(15,15)~=~2.4~~~~~~~~ }\)(through R function call "qf(0.95,15,15)" ).

While testing \(\small{\sigma_X^2 = \sigma_Y^2 }\) against an alternate hypothesis \(\small{\sigma_X^2 \lt \sigma_Y^2 }\), we reject the null if \(~~\small{\dfrac{S_Y^2}{S_X^2} \ge F_{\alpha}(m-1,n-1 ) }~~\).

In our case, since \(~~\small{\dfrac{S_Y^2}{S_X^2} \lt F_{\alpha}(m-1,n-1 ) }~~\), we fail to reject the null hypothesis and state that \(\small{\sigma_X^2 }\) is not significantly different from \(\small{\sigma_Y^2 }\). We reject the alternate hypothesis that \(\small{\sigma_x^2 \lt \sigma_y^2 }\) to a significant level of 0.05.

R-scripts

The R script below performs the one variable test on population variance.

################################################## ## Script for testing the ratio of two population variances. ## x and y are the data vectors ## alpha = significance level for the test two_sample_variance_test = function(x, y, alpha){ ## compute the standard deviation and sample size from the data sx = sd(x) sy = sd(y) n = length(x) m = length(y) ## Two sided test : Alternative hypothesis : population variances are not equal. print("Two sided alternate hypothesis : population variances are not equal :") print(" ") variance_ratio = sx^2/sy^2 F_critical = qf(1-alpha/2, n-1, m-1) if( variance_ratio >= F_critical){ print("Null hypothesis rejected, alternative accepted") print(paste("var_x/var_y = ", variance_ratio, " is greater than or equal to F_critical value " ,F_critical)) } else { print("Null hypothesis accepted, alternative rejected") print(paste("var_x/var_y = ", variance_ratio, " is less than F_critical value ", F_critical)) } print("-----------------------------------------------------") ## One sided test : Alternate hypothesis : population variance of x > population variance of y print("One sided alternate hypothesis : population variance of x > population variance of y :") print(" ") variance_ratio = sx^2/sy^2 F_critical = qf(1-alpha, n-1, m-1) variance_ratio = round(variance_ratio,2) F_critical = round(F_critical,2) if( variance_ratio >= F_critical){ print("Null hypothesis rejected, alternative accepted") print(paste("var_x/var_y = ", variance_ratio, " is greater than or equal to F_critical value ", F_critical)) } else { print("Null hypothesis accepted, alternative rejected") print(paste("var_x/var_y = ", variance_ratio, " is less than F_critical value ", F_critical)) } print("-----------------------------------------------------") ## One sided test : Alternate hypothesis : population variance of x < population variance of y print("One sided alternate hypothesis : population variance of x < population variance of y :") print(" ") variance_ratio = sy^2/sx^2 F_critical = qf(1-alpha, m-1, n-1) ## note that m-1 and n-1 have been swapped in position. variance_ratio = round(variance_ratio,2) F_critical = round(F_critical,2) if( variance_ratio >= F_critical){ print("Null hypothesis rejected, alternative accepted") print(paste("var_y/var_x = ", variance_ratio, " is greater than or equal to F_critical value ", F_critical)) } else { print("Null hypothesis accepted, alternative rejected") print(paste("var_y/var_x = ", variance_ratio, " is less than F_critical value ", F_critical)) } print("-----------------------------------------") return(1) } ### Testing x = c(1067.7, 984.3,998.8,1025.9,1060.9,959.1,1013.8,1047.0,987.8,1051.0,885.2, 1049.5,1098.2,1001.5,1011.1,991.6) y = c(957.6, 981.8, 1096.5, 984.4, 1074.3, 929.4, 1056.0, 1012.3, 1040.7, 1099.5, 1006.1, 1064.3, 865.6, 944.4, 1091.8, 952.1) two_sample_variance_test(x, y, 0.05) ###############------------------------------------------------


Executing the above script in R prints the following results and figures of probability distribution on the screen:

[1] "Two sided alternate hypothesis : population variances are not equal :" [1] " " [1] "Null hypothesis accepted, alternative rejected" [1] "var_x/var_y = 0.532487247172825 is less than F_critical value 2.86209253046351" [1] "-----------------------------------------------------" [1] "One sided alternate hypothesis : population variance of x > population variance of y :" [1] " " [1] "Null hypothesis accepted, alternative rejected" [1] "var_x/var_y = 0.53 is less than F_critical value 2.4" [1] "-----------------------------------------------------" [1] "One sided alternate hypothesis : population variance of x < population variance of y :" [1] " " [1] "Null hypothesis accepted, alternative rejected" [1] "var_y/var_x = 1.88 is less than F_critical value 2.4" [1] "-----------------------------------------"