CORRELATION
By: SABIR Chdhry
The meaning of correlation in statistics is the fact that what kind of relationship is there between two categories or series of numbers. or how two set of variables or series of data depend or differentiate from each other. We mainly divide the correlation of two or more categories into the following two types -
1. Positive Correlation
2. Negative Correlation
Let us understand both the types of correlation with an example.
1. Positive Correlation
If relation between all the values of two categories happens to be in such a way that if there is a gradual increase in the values of one category, values of the second category also increase or if there is a gradual decrease in the values of one category, then the values of the second category also decrease. This type of relationship is called positive correlation.
Positive correlation can be easily understood by the following example number 1 and 2
Example 1 |
X- Series | Y- Series |
100 | 50 |
110 | 55 |
120 | 60 |
130 | 65 |
140 | 70 |
In Example 1, as the value of the marks of category X is increasing, similarly the value of marks of category Y is also increasing, so we call the correlation between category X and category Y as positive correlation.
Example 2 |
X- Series | Y- Series |
80 | 25 |
70 | 20 |
60 | 15 |
50 | 10 |
40 | 5 |
In Example 2, as the value of marks of category X is decreasing, similarly the value of marks of category Y is also decreasing, so we call the correlation between category X and category Y as positive correlation.
2. Negative Correlation
If there is a relation between all the marks of two categories in such a way that if there is a gradual increase in the values of one category then there is a gradual decrease in the values of the second category or if there is a gradual decrease in the values of one category then the values of the second category increase respectively. This type of relationship is called negative correlation.
Negative correlation can be easily understood by the following example number 3 and 4
Example 3 |
X- Series | Y- Series |
100 | 50 |
110 | 45 |
120 | 40 |
130 | 35 |
140 | 30 |
In Example 3, values of category X is increasing, but the value of of category Y is decreasing, so we call the correlation between category X and category Y as negative correlation.
Example 4 |
X-Series | Y- Series |
100 | 50 |
90 | 55 |
80 | 60 |
70 | 65 |
60 | 70 |
In Example 4, values of category X is decreasing, but the value of category Y is increasing, so we call the correlation between category X and category Y as negative correlation.
Coefficient of Correlation
The amount of correlation between two categories is called correlation coefficient. The maximum correlation between two categories can be +1, if correlation between two series is +1, then it is called Perfect Positive Correlation, while the minimum correlation between two categories can be -1, if correlation between two series is -1 then it is called Perfect Negative Correlation.
Apart from these, the rest of the correlation results are shown in the following table-
Coefficient of Correlation | Coefficient |
+0.75 to +1 | High Level of Positive Correlation |
+0.25 to +0.75 | Medium Level of Positive Correlation |
More Than 0 but less than +0.25 | Low Level of Positive Correlation |
-0.75 to -1 | High Level of Negative Correlation |
-0.25 to -0.75 | Medium Level of Negative Correlation |
in between 0 and -0.25 | Low Level of Negative Correlation |
0 | No Correlation |
Click to Subscribe JUNIOR GEOGRAPHER
Now the problem is that how to calculate the correlation between any two categories?
There are many methods to calculate some of which are explained below.
Spearman's Rank Difference Method
formula given below is used to calculate the coefficient of correlation
Here
1. ρ = rho is a Greek letter, by this letter we denote the correlation coefficient.
2. N means the total numbers in the series like in example number 4 X and Y have five numbers each in the series.
3. We have to understand what is D in the formula. To calculate D, first of all, the numbers of category X and category Y are ranked according to their value, the number with the highest value will be ranked as 1, number with the second highest value will be ranked as 2 and so on.
The score obtained by subtracting the rank of category X from the rank of category Y is called D.
4. now We square all the values of D. Which we call D2.
5. after adding all the values of D2, we get ΣD2. Σ (Sigma) is also a Greek word.
Rest you will understand by reading the questions given below-
Question 1- Find out the correlation coefficient from the data given below by rank difference method.
Series X | Series Y |
8 | 84 |
36 | 51 |
98 | 91 |
25 | 60 |
75 | 68 |
82 | 62 |
92 | 86 |
62 | 58 |
65 | 35 |
39 | 49 |
Now we will give rank to all the values of category X and category Y, after that we will calculate D, D2 and ΣD2 by taking the difference between the ranks of both the categories, you will understand this from the table given below.
Series X | Series Y | Rank Difference | D2 |
Value | Rank X | Value | Rank Y | D = Rank X - Rank Y | D2 |
8 | 10 | 84 | 3 | 7 | 49 |
36 | 8 | 51 | 8 | 0 | 0 |
98 | 1 | 91 | 1 | 0 | 0 |
25 | 9 | 60 | 6 | 3 | 9 |
75 | 4 | 68 | 4 | 0 | 0 |
82 | 3 | 62 | 5 | -2 | 4 |
92 | 2 | 86 | 2 | 0 | 0 |
62 | 6 | 58 | 7 | -1 | 1 |
65 | 5 | 35 | 10 | -5 | 25 |
39 | 7 | 49 | 9 | -2 | 4 |
Click to Subscribe JUNIOR GEOGRAPHER
Here the value of N is 10 since the total number of digits in both the series is 10 and the value of ΣD2 is 92 since adding all the values of D2 we will get the number 92.
Now on substituting all the values in the formula given below
Since answer is positive, it means there is positive correlation in categories and also value is more than +0.25 and less than +0.75 so there is medium level of positive correlation in the categories.
Author: Sabir Chdhry
Our YouTube Channel:
Junior Geographer
Click to Subscribe JUNIOR GEOGRAPHER
Comments
Post a Comment