Mathematical tools for natural sciences

There are three possibilities:

1. There is no relatonship between X and Y. ie., they are independent of each other.

2. When X increases, Y also increases. The two vriables are said to be

3. When X increases, Y decreases.The two variables are said to be

See the figure below:

If X and Y are two data sets with size n and sample means \(\small{\overline{x}}\) and \(\small{\overline{y}}\) respectively, then the covariance of X and Y is defined as,

Cov(X,Y) is close to zero when X and Y are independent variables (uncorrelated).

Cov(X,Y) is positive when Y increases as X increases (positive correlation).

Cov(X,Y) is negative when Y decreases as X increases (negative correlation).

Suppose X and Y are independent variables. Then the sign of \(\small{x_i-\overline{x}}\) is independent of the sign of \(\small{y_i-\overline{y}}\) and their product has equal chance of taking negative or positive sign. Therefore their summation is a small number close to zero.

Assume that when X increases, Y also increases. Then for most of the data points, \(\small{x_i-\overline{x}}\) and \(\small{y_i-\overline{y}}\) take the same sign (ie., \(\small{x_i}\) is below \(\small{\overline{x}}\) when \(\small{y_i}\) is below \(\small{\overline{y}}\). Similarly, \(\small{x_i}\) is above \(\small{\overline{x}}\) when \(\small{y_i}\) is above \(\small{\overline{y}}\)), making their product a large positive number.

Assume that when X increases, Y decreases. Then for most of the data points, \(\small{x_i-\overline{x}}\) and \(\small{y_i-\overline{y}}\) take opposite signs (ie., \(\small{x_i}\) is above \(\small{\overline{x}}\) when \(\small{y_i}\) is below \(\small{\overline{y}}\). Similarly, \(\small{x_i}\) is below \(\small{\overline{x}}\) when \(\small{y_i}\) is above \(\small{\overline{y}}\)), making their product a large negative number.

In order to tackle this problem, a term called

There are may definitions of correlation coefficient. The

\(\small{R_{xy} = 0}\) when X and Y are uncorrelated).

\(\small{R_{xy} = 1}\) when X and Y have perfectly positive correlation.

\(\small{R_{xy} = -1}\) when X and Y have perfectly negative correlation.

If correlation between X and Y is not perfect, then a non-zero positive number between 0 and 1 indicates positive correltion and

\(\small{ 0 \lt R_{xy} \lt 1}\) is the region of positive correlation.

\(\small{ -1 \lt R_{xy} \lt 0}\) is the region of negative correlation.

Similarly, the function

Both the function are defined with similar arguments as,

cov(x,y) returns the covariance.cor(x,y) returns Pearson'r correlation coefficient wherex = a vector of data set Xy = a vector of data set Y Thse two functions are used in the R script below

################################################## ## Compute the covariance and correltion for the following dataset: x = c(10,20,30,40,50,60,70,80,90,100) y = c(95, 220, 279, 424, 499, 540, 720, 880, 950, 1200) cv = cov(x,y) cr = cor(x,y) print(paste("covarince = ", round(cv, digits=3))) print(paste("Pearsons correlection coefficient = ", round(cr, digits=3))) ##############------------------------------------------------

Executing the above script in R prints the following results and figures of probability distribution on the screen:

[1] "covarince = 10549.444" [1] "Pearsons correlection coefficient = 0.988"