What is the Pearson correlation coefficient and what does it measure?

The Pearson correlation coefficient (r) measures the strength and direction of the linear relationship between two continuous variables. It ranges from -1 to +1. r = 1 means a perfect positive linear relationship (as X increases, Y increases proportionally). r = -1 means a perfect negative linear relationship. r = 0 means no linear relationship. Values between 0.7 and 1.0 (or -0.7 and -1.0) indicate strong linear associations commonly reported in scientific literature.

What does the p-value mean in a correlation test?

The p-value tests the null hypothesis that the true population correlation rho = 0 (no linear relationship). A small p-value (e.g. p < 0.05) means the observed r is unlikely if the true correlation were zero, so you reject the null and conclude a significant linear association exists. The test statistic is t = r * sqrt(n-2) / sqrt(1-r2), which follows a t-distribution with n-2 degrees of freedom. This calculator reports the two-tailed p-value.

How is the regression line related to the correlation coefficient?

The slope of the least-squares regression line is b = r * (sy / sx), where sy and sx are the standard deviations of Y and X. The regression line always passes through the point (x_mean, y_mean). The sign of the slope always matches the sign of r. R2 tells you what fraction of Y's variance the regression line explains. This calculator shows both r and the regression equation y = mx + b so you can use the line to make predictions.

What sample size do I need for a reliable correlation??

As a rule of thumb, n >= 30 gives stable estimates of r. With n = 10, the margin of error around r is very large. The minimum detectable correlation at 80% power and alpha = 0.05 requires roughly n = 25 for r = 0.5, n = 85 for r = 0.3, and n = 783 for r = 0.1. Small samples frequently produce inflated r values by chance. Always report both r and n so readers can assess the reliability of your result.

What is the difference between Pearson r and Spearman rho?

Pearson r measures the linear relationship between two continuous, normally distributed variables. Spearman rho measures the monotonic (not necessarily linear) relationship between two variables based on their ranks. Use Spearman rho when: the data is ordinal (ratings, ranks), the relationship is clearly non-linear but monotonic, or when outliers are present and you want a more robust measure. This calculator computes Pearson r; use the Coefficient of Determination Calculator to explore R2 further.

When should I not use Pearson r?

Pearson r is not appropriate when: (1) the relationship is non-linear (e.g. quadratic), as r can be near 0 even for a perfect curve; (2) the data contains extreme outliers, which can inflate or deflate r dramatically; (3) the variables are ordinal rather than continuous (use Spearman rho); (4) you are comparing groups rather than looking for a linear trend (use ANOVA); (5) the data has a restricted range (range restriction attenuates r).

Can r be negative if both variables are increasing?

Yes, if you define the variables differently or if there is an indirect relationship. For example, if X = distance from city center and Y = income, X increases as you move further out, but Y might decrease (inverse relationship), giving a negative r. The sign of r depends purely on whether the variables move in the same direction (positive) or opposite directions (negative) when examined together across all data points, not on whether each variable individually increases.

Correlation Coefficient Calculator

Q: What is the formula for Pearson r?

The computational formula is r = (n*SumXY - SumX*SumY) / sqrt((n*SumX2 - (SumX)2) * (n*SumY2 - (SumY)2)). The equivalent conceptual formula is r = Sum((xi - x_mean)(yi - y_mean)) / ((n-1)*sx*sy), where sx and sy are the sample standard deviations. Both formulas give the same result. This calculator uses the computational formula, which is numerically stable and works directly from the data.

Q: How do I interpret the Pearson r value?

Common interpretation guidelines: |r| >= 0.9 = very strong, |r| >= 0.7 = strong, |r| >= 0.5 = moderate, |r| >= 0.3 = weak, |r| 0.99 may be expected; in social sciences, r = 0.5 is often considered strong.

Q: What is R-squared and how does it differ from r?

R-squared (R2 = r2) is the coefficient of determination. It measures the proportion of variance in Y explained by the linear relationship with X. For example, r = 0.80 gives R2 = 0.64, meaning 64% of Y's variability is explained by X. r is dimensionless and ranges from -1 to +1, while R2 ranges from 0 to 1. R2 is always non-negative and does not tell you the direction of the relationship, while r does.

Q: How is the regression line related to the correlation coefficient?

The slope of the least-squares regression line is b = r * (sy / sx), where sy and sx are the standard deviations of Y and X. The regression line always passes through the point (x_mean, y_mean). The sign of the slope always matches the sign of r. R2 tells you what fraction of Y's variance the regression line explains. This calculator shows both r and the regression equation y = mx + b so you can use the line to make predictions.

Q: What sample size do I need for a reliable correlation??

As a rule of thumb, n >= 30 gives stable estimates of r. With n = 10, the margin of error around r is very large. The minimum detectable correlation at 80% power and alpha = 0.05 requires roughly n = 25 for r = 0.5, n = 85 for r = 0.3, and n = 783 for r = 0.1. Small samples frequently produce inflated r values by chance. Always report both r and n so readers can assess the reliability of your result.

Find Pearson r, R-squared, the regression line, and p-value for any two-variable dataset.

X Values (comma or space separated)

Y Values (same order as X)

Number of Pairs (n)

Sum of X (Σx)

Sum of Y (Σy)

Sum of X² (Σx²)

Sum of Y² (Σy²)

Sum of XY (Σxy)

Pearson r

—

R² (Coefficient of Determination)

—

Sample Size (n)

—

t-Statistic (df = n−2)

—

Regression line: —

Slope (m): — Intercept (b): —

p-value (two-tailed): —

📊 What is the Correlation Coefficient Calculator?

The Pearson correlation coefficient (r) is a number between -1 and +1 that measures how strongly two continuous variables move together in a linear pattern. A value of +1 means a perfect positive linear relationship (when X increases, Y increases by a proportional amount). A value of -1 means a perfect negative linear relationship (when X increases, Y decreases proportionally). A value of 0 means no linear relationship exists between the two variables.

Correlation analysis is used in virtually every quantitative field. Medical researchers use it to study the relationship between a risk factor (e.g. blood pressure) and an outcome (e.g. incidence of stroke). Economists study the correlation between GDP growth and unemployment rates (Okun's Law). Engineers correlate operating temperature with equipment failure rates. Market analysts look at the correlation between two asset prices to assess diversification benefits. Education researchers measure the correlation between study hours and exam performance. In each case, the goal is to quantify how reliably one variable predicts the other.

This calculator computes Pearson r along with four additional outputs that together give a complete picture of the linear relationship. R-squared (r2) shows the proportion of variance in Y explained by X. The least-squares regression line y = mx + b gives the best linear prediction of Y from X and can be used to make forecasts. The t-statistic and two-tailed p-value test whether the observed correlation is statistically significant or could have occurred by chance. All five outputs are computed automatically from raw data pairs.

The Summary Statistics mode is useful when you already have pre-computed sums from a textbook problem or research paper. Instead of re-entering raw data, you enter n, Σx, Σy, Σx², Σy², and Σxy to compute r, the regression line, and the significance test directly. This matches the hand-calculation procedure taught in introductory statistics courses and lets you verify published results or complete homework problems efficiently.

📐 Formula

r = (nΣxy − ΣxΣy) ÷ √[(nΣx² − (Σx)²) × (nΣy² − (Σy)²)]

n = number of (x, y) data pairs

Σxy = sum of all products x_i × y_i

Σx, Σy = sum of all X values and all Y values

Σx², Σy² = sum of all squared X values and squared Y values

Example: For n = 10, Σx = 110, Σy = 169, Σx² = 1540, Σy² = 3599, Σxy = 2352: r = (23520 − 18590) ÷ √(3300 × 7429) = 4930 ÷ 4951.3 = 0.9957

slope = (nΣxy − ΣxΣy) ÷ (nΣx² − (Σx)²) intercept = (Σy − slope × Σx) ÷ n

Regression line: y = (slope) × x + (intercept)

t-statistic: t = r × √(n − 2) ÷ √(1 − r²) with degrees of freedom = n − 2

R² = r² = proportion of Y variance explained by X (0 to 1)

📖 How to Use This Calculator

Steps

Choose Raw Data or Summary Stats mode - Select Raw Data to paste your X and Y values directly. Select Summary Stats if you already have pre-computed sums (n, SumX, SumY, SumX2, SumY2, SumXY) from a textbook or study report.

Enter your data - In Raw Data mode, type or paste X values in the first box and Y values in the second box, separated by commas or spaces. Make sure X and Y have the same number of values, in matched order. In Summary Stats mode, fill in the six fields.

Click Calculate - Click Calculate to get Pearson r, R-squared, the regression equation, t-statistic, two-tailed p-value, and an automatic interpretation of the correlation strength.

Read the interpretation and regression equation - Check the interpretation label (Very Strong / Strong / Moderate / Weak / Negligible) and direction (positive or negative). Use the regression equation y = mx + b to predict Y values for any X.

💡 Example Calculations

Example 1 - Very Strong Positive Correlation (Income vs Spending)

Weekly income ($000s): 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 and spending ($000s): 3, 7, 8, 14, 15, 19, 21, 25, 27, 30

n = 10. Σx = 110, Σy = 169, Σx² = 1540, Σy² = 3599, Σxy = 2352.

r = (10 × 2352 − 110 × 169) ÷ √((10×1540−110²) × (10×3599−169²)) = 4930 ÷ √(3300 × 7429) = 4930 ÷ 4951.3 = 0.9957

R² = 0.9914. Slope = 4930/3300 = 1.4939. Intercept = (169 − 1.4939×110)/10 = 0.467. Regression: y = 1.4939x + 0.467. t = 0.9957×√8/√(1−0.9914) = 37.7, p < 0.0001.

r = 0.9957, R² = 0.9914. Very strong positive linear relationship. Highly significant.

Try this example →

Example 2 - Strong Negative Correlation (Temperature vs Heating Oil)

Temperature (C): 5, 8, 12, 15, 20, 22, 25, 28, 30, 35 and heating oil used (liters): 90, 85, 75, 68, 55, 50, 38, 30, 22, 10

As temperature rises, heating oil usage falls. We expect a negative r.

n = 10. Σx = 200, Σy = 523, Σx² = 4876, Σy² = 34027, Σxy = 8050. nΣxy − ΣxΣy = 80500 − 104600 = −24100.

denom = √((48760−40000)×(340270−273529)) = √(8760×66741) = √(584,731,160) = 24181.6. r = −24100/24181.6 = −0.9966.

r = −0.9966, R² = 0.9933. Very strong negative linear relationship. Highly significant.

Try this example →

Example 3 - Summary Statistics Mode (Exercise vs GPA)

A study of 25 students gave: n=25, SumX=87.5, SumY=82.5, SumX2=340, SumY2=295, SumXY=312

X = weekly exercise hours, Y = GPA on a 4.0 scale (scaled to match units). Switch to Summary Stats mode and enter the six values.

nΣxy − ΣxΣy = 25×312 − 87.5×82.5 = 7800 − 7218.75 = 581.25.

ssX = 25×340 − 87.5² = 8500 − 7656.25 = 843.75. ssY = 25×295 − 82.5² = 7375 − 6806.25 = 568.75. r = 581.25 / √(843.75×568.75) = 581.25/692.73 = 0.8391.

r = 0.8391, R² = 0.7041. Strong positive linear relationship. Statistically significant.

Try this example →

Example 4 - Weak Correlation (Age vs Job Satisfaction)

Ages: 30, 42, 55, 28, 65, 38, 51, 74, 45, 60 and job satisfaction (0-100): 72, 85, 78, 90, 82, 68, 75, 80, 71, 88

We are testing whether age predicts job satisfaction. The data has a lot of scatter around any potential trend.

n = 10. Σx = 488, Σy = 789. Computing all sums: nΣxy − ΣxΣy = 386740 − 385032 = 1708.

ssX = 20296, ssY = 4989. r = 1708 / √(20296×4989) = 1708 / 10062.6 = 0.1698. t = 0.1698×√8/√(1−0.0288) = 0.480/0.985 = 0.487. p = 0.64 (not significant).

r = 0.1698, R² = 0.0288. Negligible positive correlation. Not statistically significant.

Try this example →

❓ Frequently Asked Questions

What does a Pearson r of 0.9957 mean?+

An r of 0.9957 means a very strong positive linear relationship between the two variables. R² = 0.9914, so the linear model explains 99.14% of the variance in Y. This level of correlation is unusually high in real-world data and suggests either a genuine near-perfect linear law (common in physics or engineering), very clean controlled experimental data, or that the two variables are measuring essentially the same thing in different units.

What is the formula for Pearson r?+

The computational formula is r = (n×Σxy − Σx×Σy) / sqrt((n×Σx² − (Σx)²) × (n×Σy² − (Σy)²)). This is equivalent to the definitional formula r = Σ((xi − x¯)(yi − y¯)) / ((n−1)×sx×sy). Both give the same result. This calculator uses the computational form, which is numerically efficient and can be applied directly to summary statistics.

How do I interpret the Pearson r value?+

Standard interpretation guidelines: |r| >= 0.9 = very strong, |r| >= 0.7 = strong, |r| >= 0.5 = moderate, |r| >= 0.3 = weak, |r| < 0.3 = negligible. The sign tells you direction: positive means both variables increase together; negative means one increases as the other decreases. These thresholds are general guidelines. In physics r > 0.999 may be expected; in social science, r = 0.5 is often considered strong; in psychology, r = 0.3 is sometimes acceptable.

What is R-squared and how does it differ from r?+

R-squared (R² = r²) is the coefficient of determination, the proportion of variance in Y explained by the linear relationship with X. It ranges from 0 to 1. For r = 0.80, R² = 0.64 means X explains 64% of the variability in Y. r tells you both strength and direction (-1 to +1), while R² only tells you strength (0 to 1). Both are shown by this calculator because r is the standard correlation measure while R² directly quantifies explanatory power.

What does the p-value in a correlation test tell me?+

The p-value tests the null hypothesis that the population correlation rho = 0 (no linear relationship). A small p-value (typically p < 0.05) means the observed r is unlikely to occur by chance if the true correlation were zero, so you conclude that a statistically significant linear relationship exists. The test uses the t-statistic t = r×sqrt(n−2)/sqrt(1−r²) with n−2 degrees of freedom. This calculator reports the two-tailed p-value.

Can a high r be meaningless if the sample size is small?+

Yes. With n = 5 data points, r = 0.7 has a p-value of about 0.19 (not significant), because a high r can easily occur by chance with only 5 pairs. With n = 50, r = 0.3 is already statistically significant (p ≈ 0.03). Always check the p-value alongside r. A rule of thumb: you need at least n = 30 for stable correlation estimates. With n < 10, interpret r very cautiously regardless of its magnitude.

What is the difference between correlation and causation?+

Correlation measures statistical association: two variables that tend to move together. Causation means one variable directly produces changes in the other. High correlation does not establish causation. Classic examples: ice cream sales and drowning rates are both driven by summer heat (confounding variable), not by each other. A high r should prompt investigation, not automatic causal inference. Establishing causation requires controlled experiments, proper study design, and ruling out confounders and reverse causation.

What is the regression line and how do I use it?+

The regression line y = mx + b is the best linear predictor of Y given X. Slope m tells you: for every 1-unit increase in X, Y changes by m units on average. Intercept b is the predicted Y when X = 0 (only meaningful if X = 0 is within the range of data). To predict Y for a new X value: substitute X into the equation. Important: only predict within the observed range of X. Extrapolating beyond the data range can give wildly inaccurate predictions even when r is very high.

When should I use Spearman rho instead of Pearson r?+

Use Spearman rho instead of Pearson r when: (1) the data is ordinal (rankings, Likert scales) rather than continuous; (2) the relationship is monotonic but clearly non-linear; (3) the data contains outliers that would distort Pearson r; (4) normality cannot be assumed. Spearman rho is computed on the ranks of the data rather than raw values, making it more robust. For large samples with approximately normal, continuous data, Pearson r and Spearman rho usually give similar results.

How do I enter data in Summary Stats mode?+

In Summary Stats mode, enter six pre-computed values: n (number of pairs), Σx (sum of X values), Σy (sum of Y values), Σx² (sum of each X squared), Σy² (sum of each Y squared), and Σxy (sum of each x×y product). This mode is useful for textbook problems that provide summary statistics, for verifying published r values, or when working with aggregate data from statistical software output. The full regression line and p-value are computed from these six numbers.

🔗 Related Calculators

📌 Quick Tips

💡r close to 1 or -1 means a strong linear relationship. r close to 0 means little or no linear relationship. However, r = 0 does not rule out a strong non-linear relationship (e.g. a perfect U-shape gives r = 0).

💡R-squared (r2) tells you the proportion of variance in Y explained by X. For r = 0.80, R2 = 0.64 means X explains 64% of the variation in Y. The remaining 36% is due to other factors not in the model.

💡Correlation does not imply causation. A high r between ice cream sales and drowning rates does not mean ice cream causes drowning. Both are driven by a common cause (hot weather). Always consider confounders before interpreting a correlation as causal.

💡The p-value tests whether the true population correlation is zero. A significant p-value (p < 0.05) means the correlation is unlikely to be due to chance at your sample size. With very large samples, even tiny correlations (r = 0.05) can be statistically significant.

💡Enter your X values in one textarea and the corresponding Y values in the other, separated by commas or spaces. The order matters: each xi is paired with the yi at the same position in the list.