Exams
Test Series
Previous Year Papers
Syllabus
Books
Cut Off
Latest Updates
Eligibility
Karl Pearson’s Correlation Coefficient Formula, Methods & Examples
IMPORTANT LINKS
Karl Pearson’s Correlation Coefficient is a method in statistics used to measure how strongly two sets of data are related. It tells us whether the change in one variable is connected to the change in another.
This coefficient is also called Pearson’s r, and it’s often used when studying relationships in linear regression.
- If the value is positive, it means both variables increase or decrease together.
- If the value is negative, one variable increases while the other decreases.
- If the value is zero, it means there is no link or relationship between the two variables.
This method gives a number between -1 and +1, which shows how strong or weak the connection is. It’s a helpful tool in comparing trends and patterns in data.
Maths Notes Free PDFs
Topic | PDF Link |
---|---|
Class 12 Maths Important Topics Free Notes PDF | Download PDF |
Class 10, 11 Mathematics Study Notes | Download PDF |
Most Asked Maths Questions in Exams | Download PDF |
Increasing and Decreasing Function in Maths | Download PDF |
What is Karl Pearson’s Correlation Coefficient?
Karl Pearson’s coefficient of correlation is a linear correlation coefficient that comes under the range of -1 to +1. A value of -1 signifies a strong negative correlation while +1 indicates a strong positive correlation.
There are 3 assumptions of Karl Pearson’s coefficient of correlation:
- Assumption 1: The variables x and y are linearly related.
- Assumption 2: There is a cause-and-effect relationship between factors affecting the values of the variables x and y.
- Assumption 3: The random variables x and y are normally distributed.
Degree of Correlation
- Perfect: If the value is near ± 1, then it is said to be a perfect correlation: as one variable increases, the other variable tends to also increase (if positive) or decrease (if negative).
- High degree: If the coefficient value lies between ± 0.50 and ± 1, then it is said to be a strong correlation.
- Moderate degree: If the value lies between ± 0.30 and ± 0.49, then it is said to be a medium correlation.
- Low degree: When the value lies below + .29, then it is said to be a small correlation.
No correlation: When the value is zero.
Karl Pearson’s Correlation Coefficient Formula
Karl Pearson’s correlation coefficient is calculated as the covariance of the two variables divided by the product of the standard deviation of each data sample. It is the normalization of the covariance between the two variables to give an interpretable score.
Karl Pearson’s correlation coefficient formula is given below:
where
Covariance Formula:
How to Calculate Karl Pearson’s Coefficient of Correlation
There are 4 methods to calculate Karl Pearson’s Coefficient of Correlation which are given below:
- Actual Mean Method
- Assumed Mean Method
- Step Deviation Method
- Direct Method
Calculate Karl Pearson’s Coefficient of Correlation using Actual Mean Method
In actual mean method, the actual mean is found by adding up all the numbers, then dividing by how many numbers there are. In other words, it is the sum divided by the count.
- 3 Live Test
- 163 Class XI Chapter Tests
- 157 Class XII Chapter Tests
Calculate Karl Pearson’s Coefficient of Correlation using Assumed Mean Method
If the given data is large, then this method is recommended rather than a direct method for calculating the mean. This method helps in reducing the calculations and results in small numerical values. Under the assumed mean method, the correlation coefficient is calculated by taking the assumed mean only. Where dx = deviations of X from its assumed mean; dy= deviations of y from its assumed mean. Pearson’s Coefficient of correlation always lies between +1 and -1.
Calculate Karl Pearson’s Coefficient of Correlation using Step Deviation Method
The step deviation method is the extended method of the assumed or short-cut method of obtaining the mean of large values. These values of deviations are divisible by a common factor that is reduced to a smaller value. The step deviation method is also called a change of origin or scale method. To calculate the Pearson product-moment correlation by Step Deviation Method, one must first determine the covariance of the two variables in question. Next, one must calculate each variable’s standard deviation. The correlation coefficient is determined by dividing the covariance by the product of the two variables’ standard deviations.
Calculate Karl Pearson’s Coefficient of Correlation using Direct Method
Steps involved in the procedure of calculation of Karl Pearson’s coefficient of correlation by the direct method.
- Calculate mean values x and y.
- Calculate deviations of values of x series from the mean value.
- Square the deviations.
- Calculate the deviation of values of the y series from a mean value.
- Square the deviation.
- Multiply the square of deviation of X series with the square of deviations of Y series.
- Use the formula for calculating the correlation coefficient.
Types of Correlation Coefficient
The correlation coefficient shows how strongly two variables are related and in what direction. Based on this, there are three main types:
1. Positive Correlation (between 0 and +1)
This means that both variables move in the same direction.
If one value increases, the other also increases.
Example: The more time you spend exercising, the more calories you burn.
2. Negative Correlation (between 0 and -1)
Here, the two variables move in opposite directions.
If one value goes up, the other comes down.
Example: As the price of a product increases, the demand for it usually decreases.
3. Zero Correlation (exactly 0)
This means there is no connection between the two variables.
A change in one does not affect the other.
Example: A person’s height has nothing to do with their intelligence.
Change of Scale and Origin in Correlation
The correlation coefficient tells us how strongly two variables are related. A key point to remember is that this value does not change if we change the scale or origin of the data.
- Change of Origin: If you add or subtract the same number from all values in a data set, it won’t change the correlation.
For example, if you add 10 to every number in a set, the correlation stays the same. - Change of Scale: If you multiply or divide all values by the same number, the correlation coefficient also stays unchanged.
For example, if you double all the values in a set, the correlation between the variables does not change.
Example:
Let’s say we have two variables:
- X = [10, 20, 30]
- Y = [40, 60, 80]
The correlation between X and Y is 1 (perfect positive).
Now, apply changes:
- Change of Origin: Add 5 to each value of X → X becomes [15, 25, 35]
The correlation with Y is still 1. - Change of Scale: Multiply each value of Y by 2 → Y becomes [80, 120, 160]
Again, the correlation with X is still 1.
Characteristics of Karl Pearson's Coefficient of Correlation
-
- Range of Values
The value of this coefficient lies between -1 and +1.
- A value close to +1 means a strong positive relationship.
- A value close to -1 means a strong negative relationship.
- A value of 0 means there is no linear relationship between the variables.
- Shows Direction of Relationship
-
- If the coefficient is positive, it means when one variable increases, the other also increases.
- If the coefficient is negative, it means when one variable increases, the other decreases.
- Only for Linear Relationships
Karl Pearson’s method only works for straight-line relationships.
It doesn’t explain relationships that are curved or non-linear.
- Independent of Units
The value of the correlation coefficient does not change with different units like cm, kg, etc.
It is a pure number and only shows the strength and direction of the relation.
- Same in Either Direction
The correlation between X and Y is the same as the correlation between Y and X.
So, the order of variables does not affect the result.
- Range of Values
The value of this coefficient lies between -1 and +1.- A value close to +1 means a strong positive relationship.
- A value close to -1 means a strong negative relationship.
- A value of 0 means there is no linear relationship between the variables.
- If the coefficient is positive, it means when one variable increases, the other also increases.
- If the coefficient is negative, it means when one variable increases, the other decreases.
Karl Pearson’s method only works for straight-line relationships.
It doesn’t explain relationships that are curved or non-linear.
The value of the correlation coefficient does not change with different units like cm, kg, etc.
It is a pure number and only shows the strength and direction of the relation.
The correlation between X and Y is the same as the correlation between Y and X.
So, the order of variables does not affect the result.
Properties of Karl Pearson’s Coefficient of Correlation
Karl Pearson’s coefficient of correlation shows the following properties with proof:
Property 1: Karl Pearson’s Coefficient of Correlation (r) lies between and -1 and 1 i.e.
Proof: Suppose, X and Y are two variables that take values
\(\begin{matrix}
{\bar{x}}, {\bar{y}}\text{ and }{\sigma_x} {\sigma_y} \text{ standard deviation respectively. }\\
\text{ Let us consider, }\\
\sum[{x-\bar{x}\over{\sigma_x}} \pm {y-\bar{y}\over{\sigma_y}}]^2\geq{0}\\
\sum[({x-\bar{x}\over{\sigma_x}})^2 + ({y-\bar{y}\over{\sigma_y}})^2 \pm 2{(x-\bar{x})(y-\bar{y})\over{\sigma_x\sigma_y}}]\geq{0}\\
{1\over{\sigma_x^2}}\sum{x-\bar{x}} + {1\over{\sigma_y^2}}\sum{y-\bar{y}} \pm {2\over{\sigma_x\sigma_y}}{\sum(x-\bar{x})(y-\bar{y})}\geq{0}\\
\text{ Dividing both sides by n, we get }\\
{1\over{\sigma_x^2}}{\sum{x-\bar{x}}\over{n}} + {1\over{\sigma_y^2}}{\sum{y-\bar{y}}\over{n}} \pm {2\over{\sigma_x\sigma_y}}{\sum(x-\bar{x})(y-\bar{y})\over{n}}\geq{0}\\
{1\over{\sigma_x^2}}\sigma_x^2 + {1\over{\sigma_y^2}}\sigma_y^2 \pm {2\over{\sigma_x\sigma_y}} cov(x,y)\geq{0}\\
1 + 1 \pm{2r} \geq{0}\\
2 \pm{2r} \geq{0}\\
2 (1 \pm{r}) \geq{0}\\
(1 \pm{r}) \geq{0}\\
\text{ Either } (1 + {r}) \geq{0} \text{ or } (1 – {r}) \geq{0}\\
r \geq -1 \text{ or } r \leq 1\\
\therefore -1\geq{r}\geq1
\end{matrix}\)
The least value of r is –1 and the most is +1. If r = +1, there is a perfect positive correlation between the two variables. If r = -1, there is a perfect negative correlation.
If r = 0, then there is no linear relation between the variables. However, there may be a non-linear relationship between the variables.
If it is positive but close to zero, then there will be a weak positive correlation and if is close to +1, then there will be a strong positive correlation.
Property 2: Correlation coefficient is independent of change in origin and scale
Proof: Suppose, X and Y are the original variables and after changing origin and scale, we have
\(\begin{matrix}
U = {X – a \over{h}} \text{ and } U = {Y – b \over{k}} \text{ where a, b, h, k are all constants. }\\
X – a = hU \text{ and } Y – b = kV\\
X = a + hU \text{ and } Y = b + kV\\
\bar{X} = a + h\bar{U} \text{ and } \bar{Y} = b + k\bar{V}\\
X – \bar{X} = h(U – \bar{U}) \text{ and } Y – \bar{Y} = k(V – \bar{V}) \\
\text{ Now, } r_{xy} = {\sum(x – \bar{x})(y – \bar{y})\over{\sqrt{\sum(x – \bar{x})^2}\sqrt{\sum(y – \bar{y})^2}}}\\
r_{xy} = {hk\sum(U – \bar{U})(V – \bar{V})\over{\sqrt{h^2}{\sum(U – \bar{U})^2}\sqrt{k^2}\sum{(V – \bar{V})^2}}}\\
r_{xy} = {\sum(x – \bar{x})(y – \bar{y})\over{h\sqrt{\sum(x – \bar{x})^2}k\sqrt{\sum(y – \bar{y})^2}}}\\
=r_{u,v}\\
r_{xy} = r_{u,v}
\end{matrix}\)
Learn about Limits and Continuity
Property 3: Two independent variables are uncorrelated but the converse is not true
Proof: If two variables are independent then their covariance is zero, i.e., cov (X, Y) = 0
Thus, if two variables are independent their co-efficient of correlation is zero, i.e., independent variables are uncorrelated.
But, the converse is not true. If
A correlation coefficient is a pure number independent of the unit of measurement.
The correlation coefficient is symmetric.
Advantages of Karl Pearson's Coefficient of Correlation
- Simple to Use
The formula is easy to understand and calculate using basic math or a calculator.
- Shows Strength and Direction
It tells you not just how strong the link between two variables is, but also if it is positive (both increase or decrease together) or negative (one increases while the other decreases).
- Unit-Free
It does not depend on the units (like cm, kg, etc.), so it’s useful for comparing data from different sources.
- Same Result Either Way
The result is the same whether you compare X with Y or Y with X. This makes the method fair and consistent.
- Widely Accepted
It’s one of the most commonly used methods in statistics and is trusted across many fields like economics, science, and social studies.
The formula is easy to understand and calculate using basic math or a calculator.
It tells you not just how strong the link between two variables is, but also if it is positive (both increase or decrease together) or negative (one increases while the other decreases).
It does not depend on the units (like cm, kg, etc.), so it’s useful for comparing data from different sources.
The result is the same whether you compare X with Y or Y with X. This makes the method fair and consistent.
It’s one of the most commonly used methods in statistics and is trusted across many fields like economics, science, and social studies.
Disadvantages of Karl Pearson's Coefficient of Correlation
- Only Works for Straight-Line Relationships
It can only measure linear (straight-line) relationships. If the link between the variables is curved, this method won’t give the right result.
- Affected by Outliers
Unusual or extreme values in the data can change the result and give a false impression of the relationship.
- Needs Normal Data
The method assumes that the data is normally distributed, which is not always true in real-life data.
- No Cause and Effect
Even if two things are correlated, it does not mean one causes the other. The method only shows a connection, not a reason.
- Not for Categorical Data
It can only be used when the data is in numbers. It doesn't work with categories like colors, brands, or names.
It can only measure linear (straight-line) relationships. If the link between the variables is curved, this method won’t give the right result.
Unusual or extreme values in the data can change the result and give a false impression of the relationship.
The method assumes that the data is normally distributed, which is not always true in real-life data.
Even if two things are correlated, it does not mean one causes the other. The method only shows a connection, not a reason.
It can only be used when the data is in numbers. It doesn't work with categories like colors, brands, or names.
Solved Examples of Karl Pearson’s Correlation Coefficient
Example 1: Compute the correlation coefficient between x and y from the following data
Solution: \(\begin{matrix}
\text{ The formula to find the Pearson correlation coefficient is given by }\\
r = r_{xy} = \frac{Cov(x , y)}{S_x\times{S_y}}\\
Cov (x, y) = [{\sum{xy}\over{n}}] \text{ – mean of “x” . mean of “y”}\\
\text{ Mean of “x” }= [{\sum{x}\over{n}}] = {40\over{10}} = 4\\
\text{ Mean of “y” }= [{\sum{y}\over{n}}] = {50\over{10}} = 5\\
\text{ Cov (x, y) } = {50\over{10}} – 4 \times 5\\
\text{ Cov (x, y) } = 22 – 20\\
\text{ Cov (x, y) } = 2\\
\text{ SD of “x” } = \sqrt{ (\sum{x^2}/n) – (\bar{x})^2] }\\
\text{ SD of “x” } = \sqrt{ [(200/10) – (4)^2] }\\
\text{ SD of “x” } =\sqrt{ [20 – 16] }\\
\text{ SD of “x” } =\sqrt{ [4] }\\
\text{ SD of “x” } = 2\\
\text{ SD of “y” } = \sqrt{ [(∑y^2/n) – (\text{mean of y})^2] }\\
\text{ SD of “y” } = \sqrt{ [({262\over{10}}) – (5)^2] }\\
\text{ SD of “y” } = \sqrt{ [26.2 – 25] }\\
\text{ SD of “y” } = \sqrt{ [1.2] }\\
\text{ SD of “x” } = 1.0954\\
\text{ Pearson correlation coefficient is }\\
r = 2 / (2 \times 1.0954)\\
r = {2\over{2.1908}}\\
r = 0.91
\end{matrix}\)
Example 2: Find Karl Pearson’s Correlation Coefficient for the following data.
Solution:
\(\begin{matrix}
r = {{\sum{dxdy} – {\sum{dx}\times\sum{dy}\over{N}}}\over{\sqrt{\sum{dx^2} – {\sum{dx}^2\over{N}}}\times\sqrt{\sum{dy^2} – {\sum{dy}^2\over{N}}}}}\\
r = {{2116 – {47\times108\over{8}}}\over{\sqrt{1475 – {47^2\over{8}}}\times\sqrt{3468 – {108^2\over{8}}}}}\\
r = {2116 – 634.5 \over{\sqrt{1475 – 276.125} \times\sqrt{3468 – 1458}}}\\
r = {1481.5 \over{\sqrt{1198.875} \times\sqrt{2010}}}\\
r = {1481.5 \over{34.62\times44.83}}\\
r = {1481.5 \over{1552.0146}}\\
r = 0.955
\end{matrix}\)
If you are checking Karl Pearson’s Correlation Coefficient article, also check related maths articles: |
|
FAQs For Karl Pearson Coefficient of Correlation
What is Karl Pearson's Correlation Coefficient?
Karl Pearson's Correlation Coefficient is used in statistics to summarize the strength of the linear relationship between two data samples.
What are 3 assumptions of Karl Pearson’s coefficient of correlation?
There are 3 assumptions of Karl Pearson’s coefficient of correlation: The variables x and y are linearly related. There is a cause-and-effect relationship between factors affecting the values of the variables x and y. The random variables x and y are normally distributed.
What is Karl Pearson's Correlation Coefficient formula?
Karl Pearson's correlation coefficient formula is:
How to calculate Karl Pearson’s Coefficient of Correlation?
There are 4 methods to calculate Karl Pearson’s Coefficient of Correlation which are given below: Actual Mean Method Assumed Mean Method Step Deviation Method Direct Method
What are the properties of Karl Pearson’s Coefficient of Correlation?
Karl Pearson’s coefficient of correlation shows the following properties:Property 1: Karl Pearson’s Coefficient of Correlation (r) lies between and -1 and 1 i.e.
What is the range of the coefficient?
The coefficient rrr ranges from -1 to +1: +1: Perfect positive correlation 0: No correlation -1: Perfect negative correlation
When is Pearson’s correlation used?
To measure the strength and direction of a linear relationship between two quantitative variables. In fields like psychology, economics, biology, and finance.
What are the assumptions for using Pearson’s r?
Variables are measured on interval or ratio scales. The relationship between variables is linear. Data should be approximately normally distributed. There should be no significant outliers.