Correlation Coefficient Calculator

Find Pearson r, R-squared, the regression line, and p-value for any two-variable dataset.

๐Ÿ“Š Correlation Coefficient Calculator
X Values (comma or space separated)
Y Values (same order as X)
Number of Pairs (n)
Sum of X (Σx)
Sum of Y (Σy)
Sum of X² (Σx²)
Sum of Y² (Σy²)
Sum of XY (Σxy)
Pearson r
R² (Coefficient of Determination)
Sample Size (n)
t-Statistic (df = n−2)
Regression line:
Slope (m):    Intercept (b):
p-value (two-tailed):

๐Ÿ“Š What is the Correlation Coefficient Calculator?

The Pearson correlation coefficient (r) is a number between -1 and +1 that measures how strongly two continuous variables move together in a linear pattern. A value of +1 means a perfect positive linear relationship (when X increases, Y increases by a proportional amount). A value of -1 means a perfect negative linear relationship (when X increases, Y decreases proportionally). A value of 0 means no linear relationship exists between the two variables.

Correlation analysis is used in virtually every quantitative field. Medical researchers use it to study the relationship between a risk factor (e.g. blood pressure) and an outcome (e.g. incidence of stroke). Economists study the correlation between GDP growth and unemployment rates (Okun's Law). Engineers correlate operating temperature with equipment failure rates. Market analysts look at the correlation between two asset prices to assess diversification benefits. Education researchers measure the correlation between study hours and exam performance. In each case, the goal is to quantify how reliably one variable predicts the other.

This calculator computes Pearson r along with four additional outputs that together give a complete picture of the linear relationship. R-squared (r2) shows the proportion of variance in Y explained by X. The least-squares regression line y = mx + b gives the best linear prediction of Y from X and can be used to make forecasts. The t-statistic and two-tailed p-value test whether the observed correlation is statistically significant or could have occurred by chance. All five outputs are computed automatically from raw data pairs.

The Summary Statistics mode is useful when you already have pre-computed sums from a textbook problem or research paper. Instead of re-entering raw data, you enter n, ฮฃx, ฮฃy, ฮฃxยฒ, ฮฃyยฒ, and ฮฃxy to compute r, the regression line, and the significance test directly. This matches the hand-calculation procedure taught in introductory statistics courses and lets you verify published results or complete homework problems efficiently.

๐Ÿ“ Formula

r  =  (nΣxy − ΣxΣy) ÷ √[(nΣx² − (Σx)²) × (nΣy² − (Σy)²)]
n = number of (x, y) data pairs
Σxy = sum of all products xi × yi
Σx, Σy = sum of all X values and all Y values
Σx², Σy² = sum of all squared X values and squared Y values
Example: For n = 10, Σx = 110, Σy = 169, Σx² = 1540, Σy² = 3599, Σxy = 2352: r = (23520 − 18590) ÷ √(3300 × 7429) = 4930 ÷ 4951.3 = 0.9957
slope  =  (nΣxy − ΣxΣy) ÷ (nΣx² − (Σx)²)     intercept  =  (Σy − slope × Σx) ÷ n
Regression line: y = (slope) × x + (intercept)
t-statistic: t = r × √(n − 2) ÷ √(1 − r²) with degrees of freedom = n − 2
R² = r² = proportion of Y variance explained by X (0 to 1)

๐Ÿ“– How to Use This Calculator

Steps

1
Choose Raw Data or Summary Stats mode - Select Raw Data to paste your X and Y values directly. Select Summary Stats if you already have pre-computed sums (n, SumX, SumY, SumX2, SumY2, SumXY) from a textbook or study report.
2
Enter your data - In Raw Data mode, type or paste X values in the first box and Y values in the second box, separated by commas or spaces. Make sure X and Y have the same number of values, in matched order. In Summary Stats mode, fill in the six fields.
3
Click Calculate - Click Calculate to get Pearson r, R-squared, the regression equation, t-statistic, two-tailed p-value, and an automatic interpretation of the correlation strength.
4
Read the interpretation and regression equation - Check the interpretation label (Very Strong / Strong / Moderate / Weak / Negligible) and direction (positive or negative). Use the regression equation y = mx + b to predict Y values for any X.

๐Ÿ’ก Example Calculations

Example 1 - Very Strong Positive Correlation (Income vs Spending)

Weekly income ($000s): 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 and spending ($000s): 3, 7, 8, 14, 15, 19, 21, 25, 27, 30

1
n = 10. Σx = 110, Σy = 169, Σx² = 1540, Σy² = 3599, Σxy = 2352.
2
r = (10 × 2352 − 110 × 169) ÷ √((10×1540−110²) × (10×3599−169²)) = 4930 ÷ √(3300 × 7429) = 4930 ÷ 4951.3 = 0.9957
3
R² = 0.9914. Slope = 4930/3300 = 1.4939. Intercept = (169 − 1.4939×110)/10 = 0.467. Regression: y = 1.4939x + 0.467. t = 0.9957×√8/√(1−0.9914) = 37.7, p < 0.0001.
r = 0.9957, R² = 0.9914. Very strong positive linear relationship. Highly significant.
Try this example →

Example 2 - Strong Negative Correlation (Temperature vs Heating Oil)

Temperature (C): 5, 8, 12, 15, 20, 22, 25, 28, 30, 35 and heating oil used (liters): 90, 85, 75, 68, 55, 50, 38, 30, 22, 10

1
As temperature rises, heating oil usage falls. We expect a negative r.
2
n = 10. Σx = 200, Σy = 523, Σx² = 4876, Σy² = 34027, Σxy = 8050. nΣxy − ΣxΣy = 80500 − 104600 = −24100.
3
denom = √((48760−40000)×(340270−273529)) = √(8760×66741) = √(584,731,160) = 24181.6. r = −24100/24181.6 = −0.9966.
r = −0.9966, R² = 0.9933. Very strong negative linear relationship. Highly significant.
Try this example →

Example 3 - Summary Statistics Mode (Exercise vs GPA)

A study of 25 students gave: n=25, SumX=87.5, SumY=82.5, SumX2=340, SumY2=295, SumXY=312

1
X = weekly exercise hours, Y = GPA on a 4.0 scale (scaled to match units). Switch to Summary Stats mode and enter the six values.
2
nΣxy − ΣxΣy = 25×312 − 87.5×82.5 = 7800 − 7218.75 = 581.25.
3
ssX = 25×340 − 87.5² = 8500 − 7656.25 = 843.75. ssY = 25×295 − 82.5² = 7375 − 6806.25 = 568.75. r = 581.25 / √(843.75×568.75) = 581.25/692.73 = 0.8391.
r = 0.8391, R² = 0.7041. Strong positive linear relationship. Statistically significant.
Try this example →

Example 4 - Weak Correlation (Age vs Job Satisfaction)

Ages: 30, 42, 55, 28, 65, 38, 51, 74, 45, 60 and job satisfaction (0-100): 72, 85, 78, 90, 82, 68, 75, 80, 71, 88

1
We are testing whether age predicts job satisfaction. The data has a lot of scatter around any potential trend.
2
n = 10. Σx = 488, Σy = 789. Computing all sums: nΣxy − ΣxΣy = 386740 − 385032 = 1708.
3
ssX = 20296, ssY = 4989. r = 1708 / √(20296×4989) = 1708 / 10062.6 = 0.1698. t = 0.1698×√8/√(1−0.0288) = 0.480/0.985 = 0.487. p = 0.64 (not significant).
r = 0.1698, R² = 0.0288. Negligible positive correlation. Not statistically significant.
Try this example →

โ“ Frequently Asked Questions

What does a Pearson r of 0.9957 mean?+
An r of 0.9957 means a very strong positive linear relationship between the two variables. R² = 0.9914, so the linear model explains 99.14% of the variance in Y. This level of correlation is unusually high in real-world data and suggests either a genuine near-perfect linear law (common in physics or engineering), very clean controlled experimental data, or that the two variables are measuring essentially the same thing in different units.
What is the formula for Pearson r?+
The computational formula is r = (n×Σxy − Σx×Σy) / sqrt((n×Σx² − (Σx)²) × (n×Σy² − (Σy)²)). This is equivalent to the definitional formula r = Σ((xi − x¯)(yi − y¯)) / ((n−1)×sx×sy). Both give the same result. This calculator uses the computational form, which is numerically efficient and can be applied directly to summary statistics.
How do I interpret the Pearson r value?+
Standard interpretation guidelines: |r| >= 0.9 = very strong, |r| >= 0.7 = strong, |r| >= 0.5 = moderate, |r| >= 0.3 = weak, |r| < 0.3 = negligible. The sign tells you direction: positive means both variables increase together; negative means one increases as the other decreases. These thresholds are general guidelines. In physics r > 0.999 may be expected; in social science, r = 0.5 is often considered strong; in psychology, r = 0.3 is sometimes acceptable.
What is R-squared and how does it differ from r?+
R-squared (R² = r²) is the coefficient of determination, the proportion of variance in Y explained by the linear relationship with X. It ranges from 0 to 1. For r = 0.80, R² = 0.64 means X explains 64% of the variability in Y. r tells you both strength and direction (-1 to +1), while R² only tells you strength (0 to 1). Both are shown by this calculator because r is the standard correlation measure while R² directly quantifies explanatory power.
What does the p-value in a correlation test tell me?+
The p-value tests the null hypothesis that the population correlation rho = 0 (no linear relationship). A small p-value (typically p < 0.05) means the observed r is unlikely to occur by chance if the true correlation were zero, so you conclude that a statistically significant linear relationship exists. The test uses the t-statistic t = r×sqrt(n−2)/sqrt(1−r²) with n−2 degrees of freedom. This calculator reports the two-tailed p-value.
Can a high r be meaningless if the sample size is small?+
Yes. With n = 5 data points, r = 0.7 has a p-value of about 0.19 (not significant), because a high r can easily occur by chance with only 5 pairs. With n = 50, r = 0.3 is already statistically significant (p ≈ 0.03). Always check the p-value alongside r. A rule of thumb: you need at least n = 30 for stable correlation estimates. With n < 10, interpret r very cautiously regardless of its magnitude.
What is the difference between correlation and causation?+
Correlation measures statistical association: two variables that tend to move together. Causation means one variable directly produces changes in the other. High correlation does not establish causation. Classic examples: ice cream sales and drowning rates are both driven by summer heat (confounding variable), not by each other. A high r should prompt investigation, not automatic causal inference. Establishing causation requires controlled experiments, proper study design, and ruling out confounders and reverse causation.
What is the regression line and how do I use it?+
The regression line y = mx + b is the best linear predictor of Y given X. Slope m tells you: for every 1-unit increase in X, Y changes by m units on average. Intercept b is the predicted Y when X = 0 (only meaningful if X = 0 is within the range of data). To predict Y for a new X value: substitute X into the equation. Important: only predict within the observed range of X. Extrapolating beyond the data range can give wildly inaccurate predictions even when r is very high.
When should I use Spearman rho instead of Pearson r?+
Use Spearman rho instead of Pearson r when: (1) the data is ordinal (rankings, Likert scales) rather than continuous; (2) the relationship is monotonic but clearly non-linear; (3) the data contains outliers that would distort Pearson r; (4) normality cannot be assumed. Spearman rho is computed on the ranks of the data rather than raw values, making it more robust. For large samples with approximately normal, continuous data, Pearson r and Spearman rho usually give similar results.
How do I enter data in Summary Stats mode?+
In Summary Stats mode, enter six pre-computed values: n (number of pairs), Σx (sum of X values), Σy (sum of Y values), Σx² (sum of each X squared), Σy² (sum of each Y squared), and Σxy (sum of each x×y product). This mode is useful for textbook problems that provide summary statistics, for verifying published r values, or when working with aggregate data from statistical software output. The full regression line and p-value are computed from these six numbers.