## Wilcoxon signed rank test for two dependent (paired) variables

The Wilcoxon signed rank test for two dependent variable is a nonparamteric eqivalent to the two sample paired t test. The t test is used for two continuous variables which are either assumed to follow a Gaussian distribution or near Gaussian distributions, which can be taken care with the help of central limit theorem. Wilcoxon test is used when we do not know the shape of the two underlying distributions.

Since the median is more robust against the outlier when compared to mean, Wilcoxon signed rank test tests for the median value of the population.

The basic assumptions for the test are:

$~~~~~~~~(i)~~$ The paired data sets are randomly drawn from their respective continuous distributions.

$~~~~~~~~(ii)~~$ The measurements can be ordered.

Let $\small{(x_i,y_i)}$ a set of n paired observations. Let m be the median of the differences $\small{d_i = x_i-y_i }$.

The Wilcoxon test is used for testing the null hypothesis $\small{H_0:~m=0 }$ against any one of the three altrnate hypothesis $H_A: m \neq 0~~$, $H_A: m \gt 0~$ and $~H_A: m \lt 0$.

For testing the hypothesis on popultion median, we do not compute the median of the differences as a statistic. Hence the name "nonparametric". Instead, we find the absolute deviations $\left|{x_i - y_i}\right|$ of each point pair $(x_i,y_i)$ and rank these difference values to get a test statistic. The analysis is exactly similar to that of Wilcoxon signed rank test for one variable.

$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~$ Procedure for computing test statistic

We have n paires of random observations $\small{(x_i,y_i)}$ of the dependent variables X and Y.

1. Compute the differences $\small{x_i - y_i}$ of each pair of data points to get $~~~~~~~~~~~(x_1 - y_1), (x_2 - y_2),............,(x_n - y_n)$
These differences can be positive ($\small{x_i - y_i~\gt m_0}$), negative ($\small{x_i-y_i~\lt 0 }$) or sometimes zero $\small{(x_i=y_i)}$. Ignore the zero differences in the calculations ahead.

2. Find the absolute values of these devitions:
$\small{~~~~~~~~~~~\left|{x_1 - y_1}\right|, \left|{x_2 - y_2}\right|,............,\left|{x_n - y_n}\right|}$

3. Rank these absolute values in ascending order. Let $\small{R_i}$ be the rank of absolute value $\small{\left|{x_i - y_i}\right|}$. If the absolute difference from two or more observations are equal, each difference is assigned the average of the corresponding ranks.
Thus the sequence $R_1, R_2, R_3, ....., R_n$ is a permutation of positive integers 1,2,3,....,n.
For example, $R_1=4, R_2=7, R_3=9,.......,R_n=1$.

1. Compute the differences $\small{x_i - y_i}$ of each pair of data points to get $~~~~~~~~~~~(x_1 - y_1), (x_2 - y_2),............,(x_n - y_n)$
These differences can be positive ($\small{x_i - y_i~\gt m_0}$), negative ($\small{x_i-y_i~\lt 0 }$) or sometimes zero $\small{(x_i=y_i)}$. Ignore the zero differences in the calculations ahead.

2. Find the absolute values of these devitions:
$\small{~~~~~~~~~~~\left|{x_1 - y_1}\right|, \left|{x_2 - y_2}\right|,............,\left|{x_n - y_n}\right|}$

3. Rank these absolute values in ascending order. Let $\small{R_i}$ be the rank of absolute value $\small{\left|{x_i - y_i}\right|}$. If the absolute difference from two or more observations are equal, each difference is assigned the average of the corresponding ranks.
Thus the sequence $R_1, R_2, R_3, ....., R_n$ is a permutation of positive integers 1,2,3,....,n.
For example, $R_1=4, R_2=7, R_3=9,.......,R_n=1$. 4. Now, with each rank $R_i$, associate the sign of the difference $x_i - y_i$.
For example, if $\small{(x_1-y_1) \gt 0}$, then $R_1$ is positive.
If $\small{(x_3-y_3) \lt 0 }$, then $R_3$ is negative.
Thus $\small{R_1, R_2, ...., R_n}$ have become the signed ranks of the deviations from median value tested.

5. Compute the Wilcoxon test statistics W by adding the positive and negative signed ranks separately:

$~~~~~~~~~~~~~~~~~~\small{W+~=~Sum~of~positive~ranks }~~~$

$~~~~~~~~~~~~~~~~~~\small{W-~=~Sum~of~negative~ranks }~~~$

We take the minimum among W+ and W- as the test statistic.$\small{~~W~=~min(W+,W-) }$

(Chossing W+ or W- for test statistic will lead to same result).

The value of W varies from a minimum of zero to $\small{\dfrac{n(n+1)}{2}}$ (all values are either positive or negative).

Also, $~~~\small{\left|W+\right|~+~\left|W-\right|~=~\dfrac{n(n+1)}{2}},~~~$ which is the sum of n integers 0,1,2,3,...,n.

6. Knowing the probability distribution of W, we can determine a p-vlue of w under the null hypothesis.

$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~$ Hypothesis testing

If the null hypothesis $\small{H_0:~m = 0 }$ is true, we have drawn data points from a distribution whose median is 0. In this case, about half of the oberved differences $\small{x_i-y_i }$ would be positive and about half of them negative. Consequently, about half of the ranks $\small{R_i}$ values will be positive and about half of them negative.

The value of W varies from a minimum of zero to a maximum of $\small{\dfrac{n(n+1)}{2}}$, which is the sum of n integers 0,1,2,...,n when all of them are positive.

If in reality $m \gt 0$, we will be randomly drawing data points from a distribution whose median is greater tham 0, making most of the deviations $\small{x_i-y_i }$ and hence the ranks $\small{R_i}$ positive, resulting in a W+ that is too large compared to W-. Thus, for an alternate hypothesis $\small{H_A: m \gt 0}$, we reject the null $\small{H_0}$ if $\small{W- \leq w_c}$ where $\small{w_c}$ is the critical value of the test.

If $m \lt 0$, we will be randomly drawing data points from a distribution whose median is less tham $0$, making most of the deviations $\small{x_i-y_i }$ and hence the ranks $\small{R_i}$ negative, resulting in a W+ that is too small compared to W-. Thus, for an alternate hypothesis $\small{H_A: m \lt 0}$, we reject the null $\small{H_0}$ if $\small{W+ \leq w_c}$ where $\small{w_c}$ is the critical value of the test.

For a two sided alternate hypothesis $\small{H_A: m \neq 0}$, we reject the null $\small{H_0}$ if $\small{W \leq w_c}$.

In order to compute the critical values like $w_c$, we need the probability distribution W. This can be computed by two methods:

Method-1 : Direct computation :

For a given sample size n, the signed rank sum W is the summation of integers $1,2,3,...n$ after randomly assigning a minus of plus sign with equal probability to each one of them. We can compute all the possible value for W given n,thus getting the probability distribution of W. From this, we can directly compute the probability of gettting W value above or below a critical value.

For example, let $n=2$. For this, W is a sum of two numbers (1,2) each with two possible signs (+,-) assigned randomly.

This gives 4 possible values of W namely, {(-1-2),(1-2),(-1+2),(1+2)} or {-3,-1,1,3}. Since their probability should sum to 1, their probability distribution is $P(-3)=\dfrac{1}{4},~~P(-1)=\dfrac{1}{4},~~P(1)=\dfrac{1}{4},~~P(3)=\dfrac{1}{4}$.

Similarly, for any given n, the P(W) can be computed for all possible values of W using algorithms implemented in computer programs. Some algorithms comput the same distribution P(W) using statistical methods like moment generating functions. Finally a critical value table for Wilcoxon signed rank tests is created, like the one found here. For a given number of samples n used in the test, we can get the smallest rank total $\small{W_c}$ for which the probablity level is equal to or less than a statistical significance $\alpha$ from this table. Our observed statistic W is statistically significant if it is equal to or less than this rank total $\small{W_c }$. In other words, we reject the null if $\small{W \leq W_c}$

Summary of decisionon rules for the critical value table:

Compute $\small{W+,~W-}$ values from the data.

Let $\small{W~=~min(W+,W-)}$ be the test statistic.

For a given sample size n and $\small{\alpha}$ value, compute the critical value $\small{w_c}$ from the tables mentioned above.

If alternative hypothesis is $\small{H_A:~m \neq m_0}$, reject $\small{H_0:~m=m_0 }$ if $\small{W \leq w_c }$.

If alternative hypothesis is $\small{H_A:~m \gt m_0}$, reject $\small{H_0:~m=m_0 }$ if $\small{W- \leq w_c }$.

If alternative hypothesis is $\small{H_A:~m \lt m_0}$, reject $\small{H_0:~m=m_0 }$ if $\small{W+ \leq w_c }$.

Method-2 : Large sample approximation :

When the number of samples n is large, a normal approximation can be obtained for W using central limit theorem.It is given by,

$\small{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Z~=~\dfrac{W~-~\dfrac{n(n+1)}{4}}{\sqrt{\dfrac{n(n+1)(2n+1)}{24}}}~\approx N(0,1)}~~~~$, a unit normal distibution.

The above approximation is good when $n \sim 10$ or above. When the value of n is lower, it is a poor approximation. We can thus get a p-value or critical value for a given W value computed from the data.

Correction for the ties :

In the case of ties (same numbers repeared two or more times), we replaced the ranks of repeated numbers by their mean values. The ties thus affect the standard deviations of the statistic W.

A correction due to ties is applied to reduce the standard deviation term in the denominator of Z statistics described above. The corrected Z statistic is given by,

$\small{~~~~~~~~~~~~~Z~=~\dfrac{W-\dfrac{n(n+1)}{4}}{\sqrt{\dfrac{n(n+1)(2n+1)}{24}~-~\displaystyle\sum_t\dfrac{t^3 - t}{48}}} }$

The correction term is given by $\small{\displaystyle\sum_t\dfrac{t^3 - t}{48} }$, where t takes the value of number of repeats in each instance.

For example, consider the following data set which is arranged in the ascending order:

X = {7.9,8.3,8.9,8.9,10.5,10.8,11.6,11.6,12.9,13.5,14.9,16.2,16.2,16.2,17.9,23.2,25.6,25.9,30.2 }

In this data, 8.9 repeats twice, 11.6 repeats twice and 16.2 repeats thrice. The correction term therefore is,

$\small{ \displaystyle\sum_t\dfrac{2^3 - 2}{48}~+~\displaystyle\sum_t\dfrac{2^3 - 2}{48}~+~\displaystyle\sum_t\dfrac{3^3 - 3}{48}~=~ 0.75 }$

Since the above data set has 19 data points,

$\small{\dfrac{n(n+1)(2n+1)}{24}~=~\dfrac{19(20)(39)}{24}~= 617.5 }$

We see that the correction is negligible when compared to the value of variance without correction when n is large and there are few instances of smaller repeats. This can be included if necessary.

Example-1 :

A clinical trial was conducted to test the effectiveness of a psycological therapy in reducing the arthritis pain. Fourteen patients who were undegoing standard treatment for the disease were selected, and the pain level experienced by each individual was measured before and after six weeks of therapy. The paired data is presented here in a pain scale which goes from 0 (no pain) to 100 (maximum pain).

Patient ID: $\small{~~~~~~~~~1~~~~~~2~~~~~~3~~~~~~~4~~~~~~5~~~~~~6~~~~~~7~~~~~~~8~~~~~~9~~~~~~10~~~~11~~~~12~~~~13~~~~14 }$

Pre-therapy :$\small{~~~~~74~~~~72~~~~62~~~~58~~~~59~~~~65~~~~54~~~~63~~~~80~~~~66~~~~65~~~~64~~~~79~~~~60 }$

Post-therapy :$\small{~~~~79~~~~55~~~~53~~~~53~~~~74~~~~55~~~~64~~~~55~~~~39~~~~44~~~~37~~~~68~~~~54~~~~54 }$

Assuming the pain scale to be continuous, test whether there is a significant reduction in the pain after therapy when compred to the pre-therapy level. Use $\small{\alpha=0.05}$ as a significance level for thee test.

Let us denote the pre and post therapy data by variables X and Y respectively. We prepare the table with difference values and their ranks as shown below:

Assuming the pain scale to be continuous, test whether there is a significant reduction in the pain after therapy when compred to the pre-therapy level. Use $\small{\alpha=0.05}$ as a significance level for the test.

Let m be the median of the difference distribution. In order to test whether there is a significance reduction in pain, we test the null hypothesis $\small{H_0:~m=0}$ against an alternate hypothesis $\small{H_1:~m \gt 0}$

Let us denote the pre and post therapy data by variables X and Y respectively. We prepare the table with difference values and their ranks as shown below:

_______________________________________________________________________________________ $~~~~~x_i~~~~~~~~~~y_i~~~~~~~~~~x_i-y_i~~~~~~~~~~\left|x_i-y_i\right|~~~~~~~Rank~~~~~~~~signed~Rank$ _______________________________________________________________________________________
$\small{~~~~~74~~~~~~~~~~~~79~~~~~~~~~~~~-5~~~~~~~~~~~~~~~~~~~~~5~~~~~~~~~~~~~~~~~~~~2.5~~~~~~~~~~~~~~~~~-2.5}$
$\small{~~~~~72~~~~~~~~~~~~55~~~~~~~~~~~~~~17~~~~~~~~~~~~~~~~~~~~~~17~~~~~~~~~~~~~~~~~~10~~~~~~~~~~~~~~~~~~~~~~~10}$
$\small{~~~~~62~~~~~~~~~~~~53~~~~~~~~~~~~~~9~~~~~~~~~~~~~~~~~~~~~~~~9~~~~~~~~~~~~~~~~~~~~~6~~~~~~~~~~~~~~~~~~~~~~~~~6}$
$\small{~~~~~58~~~~~~~~~~~~53~~~~~~~~~~~~~~5~~~~~~~~~~~~~~~~~~~~~~~~5~~~~~~~~~~~~~~~~~~~~~2.5~~~~~~~~~~~~~~~~~~~~~2.5}$
$\small{~~~~~59~~~~~~~~~~~~74~~~~~~~~~~-15~~~~~~~~~~~~~~~~~~~~~15~~~~~~~~~~~~~~~~~~9~~~~~~~~~~~~~~~~~~~~-9}$
$\small{~~~~~65~~~~~~~~~~~~55~~~~~~~~~~~~~~10~~~~~~~~~~~~~~~~~~~~~~10~~~~~~~~~~~~~~~~~~7.5~~~~~~~~~~~~~~~~~~~~~~7.5}$
$\small{~~~~~54~~~~~~~~~~~~64~~~~~~~~~~-10~~~~~~~~~~~~~~~~~~~~~10~~~~~~~~~~~~~~~~~~7.5~~~~~~~~~~~~~~~~~-7.5}$
$\small{~~~~~63~~~~~~~~~~~~55~~~~~~~~~~~~~~8~~~~~~~~~~~~~~~~~~~~~~~~8~~~~~~~~~~~~~~~~~~~~5~~~~~~~~~~~~~~~~~~~~~~~~~~5}$
$\small{~~~~~80~~~~~~~~~~~~39~~~~~~~~~~~~~~41~~~~~~~~~~~~~~~~~~~~~~41~~~~~~~~~~~~~~~~~~14~~~~~~~~~~~~~~~~~~~~~14}$
$\small{~~~~~66~~~~~~~~~~~~44~~~~~~~~~~~~~~22~~~~~~~~~~~~~~~~~~~~~~22~~~~~~~~~~~~~~~~~~11~~~~~~~~~~~~~~~~~~~~~~~~11}$
$\small{~~~~~65~~~~~~~~~~~~37~~~~~~~~~~~~~~28~~~~~~~~~~~~~~~~~~~~~28~~~~~~~~~~~~~~~~~~~~13~~~~~~~~~~~~~~~~~~~~~13}$
$\small{~~~~~64~~~~~~~~~~~~68~~~~~~~~~~~~-4~~~~~~~~~~~~~~~~~~~~~4~~~~~~~~~~~~~~~~~~~~1~~~~~~~~~~~~~~~~~~~~-1}$
$\small{~~~~~79~~~~~~~~~~~~54~~~~~~~~~~~~~~25~~~~~~~~~~~~~~~~~~~~~25~~~~~~~~~~~~~~~~~~~12~~~~~~~~~~~~~~~~~~~~~~12}$
$\small{~~~~~60~~~~~~~~~~~~54~~~~~~~~~~~~~~6~~~~~~~~~~~~~~~~~~~~~~~~6~~~~~~~~~~~~~~~~~~~~4~~~~~~~~~~~~~~~~~~~~~~~~~~4}$
_____________________________________________________________________________________________________

The sum of positive and negative ranks is found to be,

$\small{~~~~~~~~W+~=~10+6+2.5+7.5+5+14+11+13+12+4~=~85 }$

$\small{~~~~~~~W-~=~ 2.5+9+7.5+1~=~20 }$

We check whether the sum of positive and negative differences equlas $\small{\dfrac{n(n+1)}{2} }~$:

$\small{\left|W+\right|~+~\left|W-\right\|~=~85+20~=~105~=~\dfrac{14(14+1)}{2} }$

We take $\small{W~=~min(W+,W-)~=min(85,20)~=~20 }$.

Since we have n=16 data points, we can compute the Z statistics under large sample approximation for $\small{W=20}$ to get,

$\small{~~~~~~~~~~~~~Z~=~\dfrac{W-\dfrac{n(n+1)}{4}}{\sqrt{\dfrac{n(n+1)(2n+1)}{24}}}~=~\dfrac{20-\dfrac{14(15+1)}{4}}{\sqrt{\dfrac{14(15)(29)}{24}}} =~-2.04}$

( Note:~If we use $\small{W-~=~85}$ in the above expression, we get a symmetric value of $\small{Z~=~2.04}$ and the final conclusions will be identical. )

The critical value for the test at significance of $\alpha=0.05$ can be obtained from unit Gaussian table as,

$\small{Z_c~=~Z_{\alpha}~=~Z_{0.05}~=~-1.645 }$

Since the Z statistic value of -2.04 is outside the critical region $Z_c=-1.645$, we reject the null hypothesis to conclude that for the given sample size, the data supports the alternate hypothesis $m \gt 0$.

We also compute the p-value for Z=-2.04 using the R function call "pnorm(-2.04)" to be 0.02, which rejects the null hypothesis at a significance level of $\alpha=0.05$.

For this test, we can also get the critical value using the above mentioned Wiloxon table. From this table, for a sample size $n=14$, we get a vlue of 25 as the smallest rank total for which the probability is less than or equal to $\small{\alpha=0.05}$.

Since the rank sum $\small{W=20 }$ from our data is less than the critical value 25, we reject th null hypothesis against the alternate hypothesis $\small{H_A: m \gt 0}$ at a significant level of 0.05.

## R-scripts


The Wilcoxon family of tests can be performed in R using the wilcox.test() function.

Important functon parameters are:

wilcox.test(x, y = NULL, alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, exact = NULL, correct = TRUE,
conf.int = FALSE, conf.level = 0.95, ...)

x, y  -------> two data vectors. y=NULL makes it a single vriable test.

alternative  ------>  type of alterntive hypothesis.

mu  ------> population mean to be compare. zero by default

paired   -----> In the case of two sample test, are the samples paired or not?

exact  ------>  a logical indicating whether an exact p-value should be computed.
By default, exact=NULL

correct  ---- a logical indicating whether to apply continuity correction
in the normal approximation for the p-value.

conf.int  ---- a logical indicating whether confidence intervl should be computed.

conf.level  ---- confidence level for the test.



## Wilcoxon signed rank test for one variable

Pre_therapy = (74, 72, 62, 58, 59, 65, 54, 63, 80, 66, 65, 64, 79, 60)

Post_therapy =  (79, 55, 53, 53, 74, 55, 64, 55, 39, 44, 37, 68, 54, 54)

res = wilcox.test(Pre_therapy, Post_therapy, conf.level=0.95, alternative=c("greater"),
conf.interval=TRUE, paired=TRUE )

print(res)



Executing the above script in R prints the following output.


data:  Pre_therapy and Post_therapy
V = 85, p-value = 0.02222
alternative hypothesis: true location shift is greater than 0